2005-04-16 22:20:36 +00:00
|
|
|
#
|
|
|
|
# IP Virtual Server configuration
|
|
|
|
#
|
2007-05-23 21:48:10 +00:00
|
|
|
menuconfig IP_VS
|
2008-10-11 19:18:04 +00:00
|
|
|
tristate "IP virtual server support"
|
2010-09-21 15:35:41 +00:00
|
|
|
depends on NET && INET && NETFILTER
|
2005-04-16 22:20:36 +00:00
|
|
|
---help---
|
|
|
|
IP Virtual Server support will let you build a high-performance
|
|
|
|
virtual server based on cluster of two or more real servers. This
|
|
|
|
option must be enabled for at least one of the clustered computers
|
|
|
|
that will take care of intercepting incoming connections to a
|
|
|
|
single IP address and scheduling them to real servers.
|
|
|
|
|
|
|
|
Three request dispatching techniques are implemented, they are
|
|
|
|
virtual server via NAT, virtual server via tunneling and virtual
|
|
|
|
server via direct routing. The several scheduling algorithms can
|
|
|
|
be used to choose which server the connection is directed to,
|
|
|
|
thus load balancing can be achieved among the servers. For more
|
|
|
|
information and its administration program, please visit the
|
|
|
|
following URL: <http://www.linuxvirtualserver.org/>.
|
|
|
|
|
|
|
|
If you want to compile it in kernel, say Y. To compile it as a
|
|
|
|
module, choose M here. If unsure, say N.
|
|
|
|
|
2007-05-23 21:48:10 +00:00
|
|
|
if IP_VS
|
|
|
|
|
2008-09-02 13:55:32 +00:00
|
|
|
config IP_VS_IPV6
|
2008-10-20 06:29:56 +00:00
|
|
|
bool "IPv6 support for IPVS"
|
2010-08-02 15:08:11 +00:00
|
|
|
depends on IPV6 = y || IP_VS = IPV6
|
2008-09-02 13:55:32 +00:00
|
|
|
---help---
|
|
|
|
Add IPv6 support to IPVS. This is incomplete and might be dangerous.
|
|
|
|
|
2008-10-20 06:29:56 +00:00
|
|
|
See http://www.mindbasket.com/ipvs for more information.
|
|
|
|
|
2008-09-02 13:55:32 +00:00
|
|
|
Say N if unsure.
|
|
|
|
|
2005-04-16 22:20:36 +00:00
|
|
|
config IP_VS_DEBUG
|
|
|
|
bool "IP virtual server debugging"
|
|
|
|
---help---
|
|
|
|
Say Y here if you want to get additional messages useful in
|
|
|
|
debugging the IP virtual server code. You can change the debug
|
|
|
|
level in /proc/sys/net/ipv4/vs/debug_level
|
|
|
|
|
|
|
|
config IP_VS_TAB_BITS
|
|
|
|
int "IPVS connection table size (the Nth power of 2)"
|
2008-09-08 11:38:11 +00:00
|
|
|
range 8 20
|
|
|
|
default 12
|
2005-04-16 22:20:36 +00:00
|
|
|
---help---
|
|
|
|
The IPVS connection hash table uses the chaining scheme to handle
|
|
|
|
hash collisions. Using a big IPVS connection hash table will greatly
|
|
|
|
reduce conflicts when there are hundreds of thousands of connections
|
|
|
|
in the hash table.
|
|
|
|
|
|
|
|
Note the table size must be power of 2. The table size will be the
|
|
|
|
value of 2 to the your input number power. The number to choose is
|
|
|
|
from 8 to 20, the default number is 12, which means the table size
|
|
|
|
is 4096. Don't input the number too small, otherwise you will lose
|
|
|
|
performance on it. You can adapt the table size yourself, according
|
|
|
|
to your virtual server application. It is good to set the table size
|
|
|
|
not far less than the number of connections per second multiplying
|
|
|
|
average lasting time of connection in the table. For example, your
|
|
|
|
virtual server gets 200 connections per second, the connection lasts
|
|
|
|
for 200 seconds in average in the connection table, the table size
|
|
|
|
should be not far less than 200x200, it is good to set the table
|
|
|
|
size 32768 (2**15).
|
|
|
|
|
|
|
|
Another note that each connection occupies 128 bytes effectively and
|
|
|
|
each hash entry uses 8 bytes, so you can estimate how much memory is
|
|
|
|
needed for your box.
|
|
|
|
|
IPVS: Allow boot time change of hash size
I was very frustrated about the fact that I have to recompile the kernel
to change the hash size. So, I created this patch.
If IPVS is built-in you can append ip_vs.conn_tab_bits=?? to kernel
command line, or, if you built IPVS as modules, you can add
options ip_vs conn_tab_bits=??.
To keep everything backward compatible, you still can select the size at
compile time, and that will be used as default.
It has been about a year since this patch was originally posted
and subsequently dropped on the basis of insufficient test data.
Mark Bergsma has provided the following test results which seem
to strongly support the need for larger hash table sizes:
We do however run into the same problem with the default setting (212 =
4096 entries), as most of our LVS balancers handle around a million
connections/SLAB entries at any point in time (around 100-150 kpps
load). With only 4096 hash table entries this implies that each entry
consists of a linked list of 256 connections *on average*.
To provide some statistics, I did an oprofile run on an 2.6.31 kernel,
with both the default 4096 table size, and the same kernel recompiled
with IP_VS_CONN_TAB_BITS set to 18 (218 = 262144 entries). I built a
quick test setup with a part of Wikimedia/Wikipedia's live traffic
mirrored by the switch to the test host.
With the default setting, at ~ 120 kpps packet load we saw a typical %si
CPU usage of around 30-35%, and oprofile reported a hot spot in
ip_vs_conn_in_get:
samples % image name app name
symbol name
1719761 42.3741 ip_vs.ko ip_vs.ko ip_vs_conn_in_get
302577 7.4554 bnx2 bnx2 /bnx2
181984 4.4840 vmlinux vmlinux __ticket_spin_lock
128636 3.1695 vmlinux vmlinux ip_route_input
74345 1.8318 ip_vs.ko ip_vs.ko ip_vs_conn_out_get
68482 1.6874 vmlinux vmlinux mwait_idle
After loading the recompiled kernel with 218 entries, %si CPU usage
dropped in half to around 12-18%, and oprofile looks much healthier,
with only 7% spent in ip_vs_conn_in_get:
samples % image name app name
symbol name
265641 14.4616 bnx2 bnx2 /bnx2
143251 7.7986 vmlinux vmlinux __ticket_spin_lock
140661 7.6576 ip_vs.ko ip_vs.ko ip_vs_conn_in_get
94364 5.1372 vmlinux vmlinux mwait_idle
86267 4.6964 vmlinux vmlinux ip_route_input
[ horms@verge.net.au: trivial up-port and minor style fixes ]
Signed-off-by: Catalin(ux) M. BOIE <catab@embedromix.ro>
Cc: Mark Bergsma <mark@wikimedia.org>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-01-05 04:50:24 +00:00
|
|
|
You can overwrite this number setting conn_tab_bits module parameter
|
|
|
|
or by appending ip_vs.conn_tab_bits=? to the kernel command line
|
|
|
|
if IP VS was compiled built-in.
|
|
|
|
|
2005-04-16 22:20:36 +00:00
|
|
|
comment "IPVS transport protocol load balancing support"
|
|
|
|
|
|
|
|
config IP_VS_PROTO_TCP
|
|
|
|
bool "TCP load balancing support"
|
|
|
|
---help---
|
|
|
|
This option enables support for load balancing TCP transport
|
|
|
|
protocol. Say Y if unsure.
|
|
|
|
|
|
|
|
config IP_VS_PROTO_UDP
|
|
|
|
bool "UDP load balancing support"
|
|
|
|
---help---
|
|
|
|
This option enables support for load balancing UDP transport
|
|
|
|
protocol. Say Y if unsure.
|
|
|
|
|
2008-08-22 12:06:12 +00:00
|
|
|
config IP_VS_PROTO_AH_ESP
|
2010-07-05 08:42:37 +00:00
|
|
|
def_bool IP_VS_PROTO_ESP || IP_VS_PROTO_AH
|
2008-08-22 12:06:12 +00:00
|
|
|
|
2005-04-16 22:20:36 +00:00
|
|
|
config IP_VS_PROTO_ESP
|
|
|
|
bool "ESP load balancing support"
|
|
|
|
---help---
|
2006-10-03 20:34:14 +00:00
|
|
|
This option enables support for load balancing ESP (Encapsulation
|
2005-04-16 22:20:36 +00:00
|
|
|
Security Payload) transport protocol. Say Y if unsure.
|
|
|
|
|
|
|
|
config IP_VS_PROTO_AH
|
|
|
|
bool "AH load balancing support"
|
|
|
|
---help---
|
|
|
|
This option enables support for load balancing AH (Authentication
|
|
|
|
Header) transport protocol. Say Y if unsure.
|
|
|
|
|
2010-02-18 11:31:05 +00:00
|
|
|
config IP_VS_PROTO_SCTP
|
|
|
|
bool "SCTP load balancing support"
|
|
|
|
select LIBCRC32C
|
|
|
|
---help---
|
|
|
|
This option enables support for load balancing SCTP transport
|
|
|
|
protocol. Say Y if unsure.
|
|
|
|
|
2005-04-16 22:20:36 +00:00
|
|
|
comment "IPVS scheduler"
|
|
|
|
|
|
|
|
config IP_VS_RR
|
|
|
|
tristate "round-robin scheduling"
|
|
|
|
---help---
|
|
|
|
The robin-robin scheduling algorithm simply directs network
|
|
|
|
connections to different real servers in a round-robin manner.
|
|
|
|
|
|
|
|
If you want to compile it in kernel, say Y. To compile it as a
|
|
|
|
module, choose M here. If unsure, say N.
|
|
|
|
|
|
|
|
config IP_VS_WRR
|
2009-12-22 08:42:06 +00:00
|
|
|
tristate "weighted round-robin scheduling"
|
|
|
|
select GCD
|
2005-04-16 22:20:36 +00:00
|
|
|
---help---
|
|
|
|
The weighted robin-robin scheduling algorithm directs network
|
|
|
|
connections to different real servers based on server weights
|
|
|
|
in a round-robin manner. Servers with higher weights receive
|
|
|
|
new connections first than those with less weights, and servers
|
|
|
|
with higher weights get more connections than those with less
|
|
|
|
weights and servers with equal weights get equal connections.
|
|
|
|
|
|
|
|
If you want to compile it in kernel, say Y. To compile it as a
|
|
|
|
module, choose M here. If unsure, say N.
|
|
|
|
|
|
|
|
config IP_VS_LC
|
|
|
|
tristate "least-connection scheduling"
|
|
|
|
---help---
|
|
|
|
The least-connection scheduling algorithm directs network
|
|
|
|
connections to the server with the least number of active
|
|
|
|
connections.
|
|
|
|
|
|
|
|
If you want to compile it in kernel, say Y. To compile it as a
|
|
|
|
module, choose M here. If unsure, say N.
|
|
|
|
|
|
|
|
config IP_VS_WLC
|
|
|
|
tristate "weighted least-connection scheduling"
|
|
|
|
---help---
|
|
|
|
The weighted least-connection scheduling algorithm directs network
|
|
|
|
connections to the server with the least active connections
|
|
|
|
normalized by the server weight.
|
|
|
|
|
|
|
|
If you want to compile it in kernel, say Y. To compile it as a
|
|
|
|
module, choose M here. If unsure, say N.
|
|
|
|
|
|
|
|
config IP_VS_LBLC
|
|
|
|
tristate "locality-based least-connection scheduling"
|
|
|
|
---help---
|
|
|
|
The locality-based least-connection scheduling algorithm is for
|
|
|
|
destination IP load balancing. It is usually used in cache cluster.
|
|
|
|
This algorithm usually directs packet destined for an IP address to
|
|
|
|
its server if the server is alive and under load. If the server is
|
|
|
|
overloaded (its active connection numbers is larger than its weight)
|
|
|
|
and there is a server in its half load, then allocate the weighted
|
|
|
|
least-connection server to this IP address.
|
|
|
|
|
|
|
|
If you want to compile it in kernel, say Y. To compile it as a
|
|
|
|
module, choose M here. If unsure, say N.
|
|
|
|
|
|
|
|
config IP_VS_LBLCR
|
|
|
|
tristate "locality-based least-connection with replication scheduling"
|
|
|
|
---help---
|
|
|
|
The locality-based least-connection with replication scheduling
|
|
|
|
algorithm is also for destination IP load balancing. It is
|
|
|
|
usually used in cache cluster. It differs from the LBLC scheduling
|
|
|
|
as follows: the load balancer maintains mappings from a target
|
|
|
|
to a set of server nodes that can serve the target. Requests for
|
|
|
|
a target are assigned to the least-connection node in the target's
|
|
|
|
server set. If all the node in the server set are over loaded,
|
|
|
|
it picks up a least-connection node in the cluster and adds it
|
|
|
|
in the sever set for the target. If the server set has not been
|
|
|
|
modified for the specified time, the most loaded node is removed
|
|
|
|
from the server set, in order to avoid high degree of replication.
|
|
|
|
|
|
|
|
If you want to compile it in kernel, say Y. To compile it as a
|
|
|
|
module, choose M here. If unsure, say N.
|
|
|
|
|
|
|
|
config IP_VS_DH
|
|
|
|
tristate "destination hashing scheduling"
|
|
|
|
---help---
|
|
|
|
The destination hashing scheduling algorithm assigns network
|
|
|
|
connections to the servers through looking up a statically assigned
|
|
|
|
hash table by their destination IP addresses.
|
|
|
|
|
|
|
|
If you want to compile it in kernel, say Y. To compile it as a
|
|
|
|
module, choose M here. If unsure, say N.
|
|
|
|
|
|
|
|
config IP_VS_SH
|
|
|
|
tristate "source hashing scheduling"
|
|
|
|
---help---
|
|
|
|
The source hashing scheduling algorithm assigns network
|
|
|
|
connections to the servers through looking up a statically assigned
|
|
|
|
hash table by their source IP addresses.
|
|
|
|
|
|
|
|
If you want to compile it in kernel, say Y. To compile it as a
|
|
|
|
module, choose M here. If unsure, say N.
|
|
|
|
|
|
|
|
config IP_VS_SED
|
|
|
|
tristate "shortest expected delay scheduling"
|
|
|
|
---help---
|
|
|
|
The shortest expected delay scheduling algorithm assigns network
|
|
|
|
connections to the server with the shortest expected delay. The
|
|
|
|
expected delay that the job will experience is (Ci + 1) / Ui if
|
|
|
|
sent to the ith server, in which Ci is the number of connections
|
2006-10-03 20:36:44 +00:00
|
|
|
on the ith server and Ui is the fixed service rate (weight)
|
2005-04-16 22:20:36 +00:00
|
|
|
of the ith server.
|
|
|
|
|
|
|
|
If you want to compile it in kernel, say Y. To compile it as a
|
|
|
|
module, choose M here. If unsure, say N.
|
|
|
|
|
|
|
|
config IP_VS_NQ
|
|
|
|
tristate "never queue scheduling"
|
|
|
|
---help---
|
|
|
|
The never queue scheduling algorithm adopts a two-speed model.
|
|
|
|
When there is an idle server available, the job will be sent to
|
|
|
|
the idle server, instead of waiting for a fast one. When there
|
|
|
|
is no idle server available, the job will be sent to the server
|
|
|
|
that minimize its expected delay (The Shortest Expected Delay
|
|
|
|
scheduling algorithm).
|
|
|
|
|
|
|
|
If you want to compile it in kernel, say Y. To compile it as a
|
|
|
|
module, choose M here. If unsure, say N.
|
|
|
|
|
|
|
|
comment 'IPVS application helper'
|
|
|
|
|
|
|
|
config IP_VS_FTP
|
|
|
|
tristate "FTP protocol helper"
|
2010-09-21 15:35:41 +00:00
|
|
|
depends on IP_VS_PROTO_TCP && NF_CONNTRACK && NF_NAT
|
|
|
|
select IP_VS_NFCT
|
2005-04-16 22:20:36 +00:00
|
|
|
---help---
|
|
|
|
FTP is a protocol that transfers IP address and/or port number in
|
|
|
|
the payload. In the virtual server via Network Address Translation,
|
|
|
|
the IP address and port number of real servers cannot be sent to
|
|
|
|
clients in ftp connections directly, so FTP protocol helper is
|
|
|
|
required for tracking the connection and mangling it back to that of
|
|
|
|
virtual service.
|
|
|
|
|
|
|
|
If you want to compile it in kernel, say Y. To compile it as a
|
|
|
|
module, choose M here. If unsure, say N.
|
|
|
|
|
2010-09-21 15:35:41 +00:00
|
|
|
config IP_VS_NFCT
|
|
|
|
bool "Netfilter connection tracking"
|
|
|
|
depends on NF_CONNTRACK
|
|
|
|
---help---
|
|
|
|
The Netfilter connection tracking support allows the IPVS
|
|
|
|
connection state to be exported to the Netfilter framework
|
|
|
|
for filtering purposes.
|
|
|
|
|
2007-05-23 21:48:10 +00:00
|
|
|
endif # IP_VS
|