linux/net/ipv4
David S. Miller 62fa8a846d net: Implement read-only protection and COW'ing of metrics.
Routing metrics are now copy-on-write.

Initially a route entry points it's metrics at a read-only location.
If a routing table entry exists, it will point there.  Else it will
point at the all zero metric place-holder called 'dst_default_metrics'.

The writeability state of the metrics is stored in the low bits of the
metrics pointer, we have two bits left to spare if we want to store
more states.

For the initial implementation, COW is implemented simply via kmalloc.
However future enhancements will change this to place the writable
metrics somewhere else, in order to increase sharing.  Very likely
this "somewhere else" will be the inetpeer cache.

Note also that this means that metrics updates may transiently fail
if we cannot COW the metrics successfully.

But even by itself, this patch should decrease memory usage and
increase cache locality especially for routing workloads.  In those
cases the read-only metric copies stay in place and never get written
to.

TCP workloads where metrics get updated, and those rare cases where
PMTU triggers occur, will take a very slight performance hit.  But
that hit will be alleviated when the long-term writable metrics
move to a more sharable location.

Since the metrics storage went from a u32 array of RTAX_MAX entries to
what is essentially a pointer, some retooling of the dst_entry layout
was necessary.

Most importantly, we need to preserve the alignment of the reference
count so that it doesn't share cache lines with the read-mostly state,
as per Eric Dumazet's alignment assertion checks.

The only non-trivial bit here is the move of the 'flags' member into
the writeable cacheline.  This is OK since we are always accessing the
flags around the same moment when we made a modification to the
reference count.

Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-26 20:51:05 -08:00
..
netfilter netfilter: nf_nat: place conntrack in source hash after SNAT is done 2011-01-20 15:49:52 +01:00
af_inet.c net: change netdev->features to u32 2011-01-24 15:32:47 -08:00
ah4.c ah: reload pointers to skb data after calling skb_cow_data() 2011-01-11 14:03:10 -08:00
arp.c net: arp_ioctl() must hold RTNL 2011-01-24 13:16:16 -08:00
cipso_ipv4.c Update broken web addresses in the kernel. 2010-10-18 11:03:14 +02:00
datagram.c net: return operator cleanup 2010-09-23 14:33:39 -07:00
devinet.c ipv4: Don't pre-seed hoplimit metric. 2010-12-12 22:08:17 -08:00
esp4.c xfrm: Traffic Flow Confidentiality for IPv4 ESP 2010-12-10 14:43:59 -08:00
fib_frontend.c Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 2010-12-26 22:37:05 -08:00
fib_hash.c fib: Fix fib zone and its hash leak on namespace stop 2010-10-28 10:27:03 -07:00
fib_lookup.h fib: fib_result_assign() should not change fib refcounts 2010-11-04 12:05:32 -07:00
fib_rules.c netfilter: fix Kconfig dependencies 2011-01-14 13:36:42 +01:00
fib_semantics.c Merge branch 'master' of git://1984.lsi.us.es/net-next-2.6 2011-01-14 14:12:37 +01:00
fib_trie.c net: allow GFP_HIGHMEM in __vmalloc() 2010-11-21 10:04:04 -08:00
gre.c tunnels: add _rcu annotations 2010-10-25 13:09:45 -07:00
icmp.c Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 2010-11-19 13:13:47 -08:00
igmp.c igmp: refine skb allocations 2010-11-18 11:02:23 -08:00
inet_connection_sock.c tcp: disallow bind() to reuse addr/port 2011-01-11 14:03:07 -08:00
inet_diag.c Revert "netlink: test for all flags of the NLM_F_DUMP composite" 2011-01-19 13:34:20 -08:00
inet_fragment.c net/ipv4: EXPORT_SYMBOL cleanups 2010-07-12 12:57:54 -07:00
inet_hashtables.c inet: Fix __inet_inherit_port() to correctly increment bsockets and num_owners 2010-11-28 18:18:44 -08:00
inet_lro.c
inet_timewait_sock.c include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h 2010-03-30 22:02:32 +09:00
inetpeer.c inetpeer: Use correct AVL tree base pointer in inet_getpeer(). 2011-01-24 14:38:09 -08:00
ip_forward.c net-next: remove useless union keyword 2010-06-10 23:31:35 -07:00
ip_fragment.c ipv4: IP defragmentation must be ECN aware 2011-01-06 11:21:30 -08:00
ip_gre.c ipv4: Don't pre-seed hoplimit metric. 2010-12-12 22:08:17 -08:00
ip_input.c netfilter: fix Kconfig dependencies 2011-01-14 13:36:42 +01:00
ip_options.c bridge : Sanitize skb before it enters the IP stack 2010-09-19 12:42:34 -07:00
ip_output.c ipv4: Don't pre-seed hoplimit metric. 2010-12-12 22:08:17 -08:00
ip_sockglue.c ipv4: add __rcu annotations to ip_ra_chain 2010-10-25 14:18:28 -07:00
ipcomp.c xfrm: SA lookups signature with mark 2010-02-22 16:20:22 -08:00
ipconfig.c net: add some KERN_CONT markers to continuation lines 2010-11-28 10:47:17 -08:00
ipip.c ipip: add module alias for tunl0 tunnel device 2010-12-01 12:53:23 -08:00
ipmr.c net: use the macros defined for the members of flowi 2010-11-17 12:27:45 -08:00
Kconfig Merge branch 'master' of /repos/git/net-next-2.6 2011-01-19 23:51:37 +01:00
Makefile PPTP: PPP over IPv4 (Point-to-Point Tunneling Protocol) 2010-08-21 23:05:39 -07:00
netfilter.c net: use the macros defined for the members of flowi 2010-11-17 12:27:45 -08:00
proc.c tcp: Replace time wait bucket msg by counter 2010-12-08 12:16:33 -08:00
protocol.c net: add __rcu annotations to protocol 2010-10-27 11:37:31 -07:00
raw.c net: use the macros defined for the members of flowi 2010-11-17 12:27:45 -08:00
route.c net: Implement read-only protection and COW'ing of metrics. 2011-01-26 20:51:05 -08:00
syncookies.c net: use the macros defined for the members of flowi 2010-11-17 12:27:45 -08:00
sysctl_net_ipv4.c net: add limits to ip_default_ttl 2010-12-13 12:16:14 -08:00
tcp.c net: change netdev->features to u32 2011-01-24 15:32:47 -08:00
tcp_bic.c
tcp_cong.c net/ipv4: Eliminate kstrdup memory leak 2010-08-27 19:31:56 -07:00
tcp_cubic.c
tcp_diag.c tcp: diag: Dont report negative values for rx queue 2009-12-03 16:06:13 -08:00
tcp_highspeed.c
tcp_htcp.c
tcp_hybla.c TCP: tcp_hybla: Fix integer overflow in slow start increment 2010-06-02 07:15:48 -07:00
tcp_illinois.c Update broken web addresses in the kernel. 2010-10-18 11:03:14 +02:00
tcp_input.c TCP: fix a bug that triggers large number of TCP RST by mistake 2011-01-25 13:46:30 -08:00
tcp_ipv4.c tcp: fix bug in listening_get_next() 2011-01-24 14:41:20 -08:00
tcp_lp.c
tcp_minisocks.c Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 2010-12-08 13:47:38 -08:00
tcp_output.c Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial 2011-01-13 10:05:56 -08:00
tcp_probe.c net: ipv4: tcp_probe: cleanup snprintf() use 2010-11-17 12:27:46 -08:00
tcp_scalable.c
tcp_timer.c tcp: use correct counters in CA_CWR state too 2010-10-17 13:46:33 -07:00
tcp_vegas.c
tcp_vegas.h
tcp_veno.c Update broken web addresses in the kernel. 2010-10-18 11:03:14 +02:00
tcp_westwood.c net: return operator cleanup 2010-09-23 14:33:39 -07:00
tcp_yeah.c
tunnel4.c tunnels: add __rcu annotations 2010-10-27 11:37:32 -07:00
udp.c net: change netdev->features to u32 2011-01-24 15:32:47 -08:00
udp_impl.h
udplite.c net: fix nulls list corruptions in sk_prot_alloc 2010-12-16 14:26:56 -08:00
xfrm4_input.c net/ipv4: EXPORT_SYMBOL cleanups 2010-07-12 12:57:54 -07:00
xfrm4_mode_beet.c
xfrm4_mode_transport.c
xfrm4_mode_tunnel.c ipv4: Don't pre-seed hoplimit metric. 2010-12-12 22:08:17 -08:00
xfrm4_output.c netfilter: ipv4: use NFPROTO values for NF_HOOK invocation 2010-03-25 16:00:30 +01:00
xfrm4_policy.c net: Implement read-only protection and COW'ing of metrics. 2011-01-26 20:51:05 -08:00
xfrm4_state.c xfrm: Allow different selector family in temporary state 2010-09-20 11:11:38 -07:00
xfrm4_tunnel.c net: struct xfrm_tunnel in read_mostly section 2010-08-30 13:50:45 -07:00