move mcvlan out of experimental docs

Signed-off-by: allencloud <allen.sun@daocloud.io>
master
allencloud 2016-08-26 00:34:32 +08:00 committed by Tibor Vass
parent 1c60ac11d6
commit 6fe8c73b60
5 changed files with 13 additions and 265 deletions

View File

@ -72,7 +72,7 @@ to build a Docker binary with the experimental features enabled:
## Current experimental features
* [External graphdriver plugins](plugins_graphdriver.md)
* [Macvlan and Ipvlan Network Drivers](vlan-networks.md)
* [Ipvlan Network Drivers](vlan-networks.md)
* [Docker Stacks and Distributed Application Bundles](docker-stacks-and-bundles.md)
* [Checkpoint & Restore](checkpoint-restore.md)

File diff suppressed because one or more lines are too long

Binary file not shown.

Before

Width:  |  Height:  |  Size: 22 KiB

File diff suppressed because one or more lines are too long

Before

Width:  |  Height:  |  Size: 39 KiB

View File

@ -1,15 +1,12 @@
# Macvlan and Ipvlan Network Drivers
# Ipvlan Network Driver
### Getting Started
The Macvlan and Ipvlan drivers are currently in experimental mode in order to incubate Docker users use cases and vet the implementation to ensure a hardened, production ready driver in a future release. Libnetwork now gives users total control over both IPv4 and IPv6 addressing. The VLAN drivers build on top of that in giving operators complete control of layer 2 VLAN tagging and even Ipvlan L3 routing for users interested in underlay network integration. For overlay deployments that abstract away physical constraints see the [multi-host overlay ](https://docs.docker.com/engine/userguide/networking/get-started-overlay/) driver.
The Ipvlan driver is currently in experimental mode in order to incubate Docker users use cases and vet the implementation to ensure a hardened, production ready driver in a future release. Libnetwork now gives users total control over both IPv4 and IPv6 addressing. The VLAN driver builds on top of that in giving operators complete control of layer 2 VLAN tagging and even Ipvlan L3 routing for users interested in underlay network integration. For overlay deployments that abstract away physical constraints see the [multi-host overlay ](https://docs.docker.com/engine/userguide/networking/get-started-overlay/) driver.
Macvlan and Ipvlan are a new twist on the tried and true network virtualization technique. The Linux implementations are extremely lightweight because rather than using the traditional Linux bridge for isolation, they are simply associated to a Linux Ethernet interface or sub-interface to enforce separation between networks and connectivity to the physical network.
Macvlan and Ipvlan offer a number of unique features and plenty of room for further innovations with the various modes. Two high level advantages of these approaches are, the positive performance implications of bypassing the Linux bridge and the simplicity of having less moving parts. Removing the bridge that traditionally resides in between the Docker host NIC and container interface leaves a very simple setup consisting of container interfaces, attached directly to the Docker host interface. This result is easy access for external facing services as there is no port mappings in these scenarios.
Ipvlan is a new twist on the tried and true network virtualization technique. The Linux implementations are extremely lightweight because rather than using the traditional Linux bridge for isolation, they are simply associated to a Linux Ethernet interface or sub-interface to enforce separation between networks and connectivity to the physical network.
Ipvlan offers a number of unique features and plenty of room for further innovations with the various modes. Two high level advantages of these approaches are, the positive performance implications of bypassing the Linux bridge and the simplicity of having less moving parts. Removing the bridge that traditionally resides in between the Docker host NIC and container interface leaves a very simple setup consisting of container interfaces, attached directly to the Docker host interface. This result is easy access for external facing services as there is no port mappings in these scenarios.
### Pre-Requisites
@ -20,122 +17,11 @@ Macvlan and Ipvlan offer a number of unique features and plenty of room for furt
- Kernel requirements:
- To check your current kernel version, use `uname -r` to display your kernel version
- Macvlan Linux kernel v3.93.19 and 4.0+
- Ipvlan Linux kernel v4.2+ (support for earlier kernels exists but is buggy)
### MacVlan Bridge Mode Example Usage
Macvlan Bridge mode has a unique MAC address per container used to track MAC to port mappings by the Docker host. This is the largest difference from Ipvlan L2 mode which uses the same MAC address as the parent interface for each container `eth0` interface.
- Macvlan and Ipvlan driver networks are attached to a parent Docker host interface. Examples are a physical interface such as `eth0`, a sub-interface for 802.1q VLAN tagging like `eth0.10` (`.10` representing VLAN `10`) or even bonded host adaptors which bundle two Ethernet interfaces into a single logical interface.
- The specified gateway is external to the host provided by the network infrastructure.
- Each Macvlan Bridge mode Docker network is isolated from one another and there can be only one network attached to a parent interface at a time. There is a theoretical limit of 4,094 sub-interfaces per host adaptor that a Docker network could be attached to.
- It is not recommended to mix ipvlan and macvlan networks on the same `-o parent=` interface. Older kernel versions will throw uninformative netlink errors such as `device is busy`.
- Any container inside the same subnet can talk any other container in the same network without a gateway in both `macvlan bridge` mode and `ipvlan L2` modes.
- The same `docker network` commands apply to the vlan drivers. Some are irrelevant such as `-icc` or `--set-macaddress` for the Ipvlan driver.
- In Macvlan and Ipvlan L2 mode, containers on separate networks cannot reach one another without an external process routing between the two networks/subnets. This also applies to multiple subnets within the same `docker network`. See Ipvlan L3 mode for inter-subnet communications without a router.
In the following example, `eth0` on the docker host has an IP on the `172.16.86.0/24` network and a default gateway of `172.16.86.1`. The gateway is an external router with an address of `172.16.86.1`. An IP address is not required on the Docker host interface `eth0` in `bridge` mode, it merely needs to be on the proper upstream network to get forwarded by a network switch or network router.
![Simple Macvlan Bridge Mode Example](images/macvlan_bridge_simple.png)
**Note** For Macvlan bridge mode and Ipvlan L2 mode the subnet values need to match the NIC's interface of the Docker host. For example, Use the same subnet and gateway of the Docker host ethernet interface that is specified by the `-o parent=` option.
- The parent interface used in this example is `eth0` and it is on the subnet `172.16.86.0/24`. The containers in the `docker network` will also need to be on this same subnet as the parent `-o parent=`. The gateway is an external router on the network, not any ip masquerading or any other local proxy.
- The driver is specified with `-d driver_name` option. In this case `-d macvlan`
- The parent interface `-o parent=eth0` is configured as followed:
```
ip addr show eth0
3: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
inet 172.16.86.250/24 brd 172.16.86.255 scope global eth0
```
Create the macvlan network and run a couple of containers attached to it:
```
# Macvlan (-o macvlan_mode= Defaults to Bridge mode if not specified)
docker network create -d macvlan \
--subnet=172.16.86.0/24 \
--gateway=172.16.86.1 \
-o parent=eth0 pub_net
# Run a container on the new network specifying the --ip address.
docker run --net=pub_net --ip=172.16.86.10 -itd alpine /bin/sh
# Start a second container and ping the first
docker run --net=pub_net -it --rm alpine /bin/sh
ping -c 4 172.16.86.10
```
Take a look at the containers ip and routing table:
```
ip a show eth0
eth0@if3: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1500 qdisc noqueue state UNKNOWN
link/ether 46:b2:6b:26:2f:69 brd ff:ff:ff:ff:ff:ff
inet 172.16.86.2/24 scope global eth0
ip route
default via 172.16.86.1 dev eth0
172.16.86.0/24 dev eth0 src 172.16.86.2
# NOTE: the containers can NOT ping the underlying host interfaces as
# they are intentionally filtered by Linux for additional isolation.
# In this case the containers cannot ping the -o parent=172.16.86.250
```
You can explicitly specify the `bridge` mode option `-o macvlan_mode=bridge`. It is the default so will be in `bridge` mode either way.
While the `eth0` interface does not need to have an IP address in Macvlan Bridge mode or Ipvlan L2 mode it is not uncommon to have an IP address on the interface. Addresses can be excluded from getting an address from the default built in IPAM by using the `--aux-address=x.x.x.x` flag. This will blacklist the specified address from being handed out to containers. The same network example above blocking the `-o parent=eth0` address from being handed out to a container.
```
docker network create -d macvlan \
--subnet=172.16.86.0/24 \
--gateway=172.16.86.1 \
--aux-address="exclude_host=172.16.86.250" \
-o parent=eth0 pub_net
```
Another option for subpool IP address selection in a network provided by the default Docker IPAM driver is to use `--ip-range=`. This specifies the driver to allocate container addresses from this pool rather then the broader range from the `--subnet=` argument from a network create as seen in the following example that will allocate addresses beginning at `192.168.32.128` and increment upwards from there.
```
docker network create -d macvlan \
--subnet=192.168.32.0/24 \
--ip-range=192.168.32.128/25 \
--gateway=192.168.32.254 \
-o parent=eth0 macnet32
# Start a container and verify the address is 192.168.32.128
docker run --net=macnet32 -it --rm alpine /bin/sh
```
The network can then be deleted with:
```
docker network rm <network_name or id>
```
- **Note:** In both Macvlan and Ipvlan you are not able to ping or communicate with the default namespace IP address. For example, if you create a container and try to ping the Docker host's `eth0` it will **not** work. That traffic is explicitly filtered by the kernel modules themselves to offer additional provider isolation and security.
For more on Docker networking commands see [Working with Docker network commands](https://docs.docker.com/engine/userguide/networking/work-with-networks/)
### Ipvlan L2 Mode Example Usage
The ipvlan `L2` mode example is virtually identical to the macvlan `bridge` mode example. The driver is specified with `-d driver_name` option. In this case `-d ipvlan`
The ipvlan `L2` mode example is like the following image. The driver is specified with `-d driver_name` option. In this case `-d ipvlan`.
![Simple Ipvlan L2 Mode Example](images/ipvlan_l2_simple.png)
@ -166,11 +52,11 @@ docker run --net=db_net -it --rm alpine /bin/sh
# they are intentionally filtered by Linux for additional isolation.
```
The default mode for Ipvlan is `l2`. The default mode for Macvlan is `bridge`. If `-o ipvlan_mode=` or `-o macvlan_mode=` are left unspecified, the default modes will be used. Similarly, if the `--gateway` is left empty, the first usable address on the network will be set as the gateway. For example, if the subnet provided in the network create is `--subnet=192.168.1.0/24` then the gateway the container receives is `192.168.1.1`.
The default mode for Ipvlan is `l2`. If `-o ipvlan_mode=` are left unspecified, the default mode will be used. Similarly, if the `--gateway` is left empty, the first usable address on the network will be set as the gateway. For example, if the subnet provided in the network create is `--subnet=192.168.1.0/24` then the gateway the container receives is `192.168.1.1`.
To help understand how this mode interacts with other hosts, the following figure shows the same layer 2 segment between two Docker hosts that applies to both Macvlan Bride mode and Ipvlan L2 mode.
To help understand how this mode interacts with other hosts, the following figure shows the same layer 2 segment between two Docker hosts that applies to and Ipvlan L2 mode.
![Multiple Ipvlan and Macvlan Hosts](images/macvlan-bridge-ipvlan-l2.png)
![Multiple Ipvlan Hosts](images/macvlan-bridge-ipvlan-l2.png)
The following will create the exact same network as the network `db_net` created prior, with the driver defaults for `--gateway=192.168.1.1` and `-o ipvlan_mode=l2`.
@ -219,84 +105,6 @@ docker exec -it cid2 /bin/sh
docker exec -it cid3 /bin/sh
```
### Macvlan 802.1q Trunk Bridge Mode Example Usage
VLANs (Virtual Local Area Networks) have long been a primary means of virtualizing data center networks and are still in virtually all existing networks today. VLANs work by tagging a Layer-2 isolation domain with a 12-bit identifier ranging from 1-4094 that is inserted into a packet header that enables a logical grouping of a single or multiple subnets of both IPv4 and IPv6. It is very common for network operators to separate traffic using VLANs based on a subnet(s) function or security profile such as `web`, `db` or any other isolation needs.
It is very common to have a compute host requirement of running multiple virtual networks concurrently on a host. Linux networking has long supported VLAN tagging, also known by its standard 802.1q, for maintaining datapath isolation between networks. The Ethernet link connected to a Docker host can be configured to support the 802.1q VLAN IDs, by creating Linux sub-interfaces, each one dedicated to a unique VLAN ID.
![Simple Ipvlan L2 Mode Example](images/multi_tenant_8021q_vlans.png)
Trunking 802.1q to a Linux host is notoriously painful for many in operations. It requires configuration file changes in order to be persistent through a reboot. If a bridge is involved, a physical NIC needs to be moved into the bridge and the bridge then gets the IP address. This has lead to many a stranded servers since the risk of cutting off access during that convoluted process is high.
Like all of the Docker network drivers, the overarching goal is to alleviate the operational pains of managing network resources. To that end, when a network receives a sub-interface as the parent that does not exist, the drivers create the VLAN tagged interfaces while creating the network.
In the case of a host reboot, instead of needing to modify often complex network configuration files the driver will recreate all network links when the Docker daemon restarts. The driver tracks if it created the VLAN tagged sub-interface originally with the network create and will **only** recreate the sub-interface after a restart or delete `docker network rm` the link if it created it in the first place with `docker network create`.
If the user doesn't want Docker to modify the `-o parent` sub-interface, the user simply needs to pass an existing link that already exists as the parent interface. Parent interfaces such as `eth0` are not deleted, only sub-interfaces that are not master links.
For the driver to add/delete the vlan sub-interfaces the format needs to be `interface_name.vlan_tag`.
For example: `eth0.50` denotes a parent interface of `eth0` with a slave of `eth0.50` tagged with vlan id `50`. The equivalent `ip link` command would be `ip link add link eth0 name eth0.50 type vlan id 50`.
Replace the `macvlan` with `ipvlan` in the `-d` driver argument to create macvlan 802.1q trunks.
**Vlan ID 50**
In the first network tagged and isolated by the Docker host, `eth0.50` is the parent interface tagged with vlan id `50` specified with `-o parent=eth0.50`. Other naming formats can be used, but the links need to be added and deleted manually using `ip link` or Linux configuration files. As long as the `-o parent` exists anything can be used if compliant with Linux netlink.
```
# now add networks and hosts as you would normally by attaching to the master (sub)interface that is tagged
docker network create -d macvlan \
--subnet=192.168.50.0/24 \
--gateway=192.168.50.1 \
-o parent=eth0.50 macvlan50
# In two separate terminals, start a Docker container and the containers can now ping one another.
docker run --net=macvlan50 -it --name macvlan_test5 --rm alpine /bin/sh
docker run --net=macvlan50 -it --name macvlan_test6 --rm alpine /bin/sh
```
**Vlan ID 60**
In the second network, tagged and isolated by the Docker host, `eth0.60` is the parent interface tagged with vlan id `60` specified with `-o parent=eth0.60`. The `macvlan_mode=` defaults to `macvlan_mode=bridge`. It can also be explicitly set with the same result as shown in the next example.
```
# now add networks and hosts as you would normally by attaching to the master (sub)interface that is tagged.
docker network create -d macvlan \
--subnet=192.168.60.0/24 \
--gateway=192.168.60.1 \
-o parent=eth0.60 -o \
-o macvlan_mode=bridge macvlan60
# In two separate terminals, start a Docker container and the containers can now ping one another.
docker run --net=macvlan60 -it --name macvlan_test7 --rm alpine /bin/sh
docker run --net=macvlan60 -it --name macvlan_test8 --rm alpine /bin/sh
```
**Example:** Multi-Subnet Macvlan 802.1q Trunking
The same as the example before except there is an additional subnet bound to the network that the user can choose to provision containers on. In MacVlan/Bridge mode, containers can only ping one another if they are on the same subnet/broadcast domain unless there is an external router that routes the traffic (answers ARP etc) between the two subnets.
```
### Create multiple L2 subnets
docker network create -d ipvlan \
--subnet=192.168.210.0/24 \
--subnet=192.168.212.0/24 \
--gateway=192.168.210.254 \
--gateway=192.168.212.254 \
-o ipvlan_mode=l2 ipvlan210
# Test 192.168.210.0/24 connectivity between containers
docker run --net=ipvlan210 --ip=192.168.210.10 -itd alpine /bin/sh
docker run --net=ipvlan210 --ip=192.168.210.9 -it --rm alpine ping -c 2 192.168.210.10
# Test 192.168.212.0/24 connectivity between containers
docker run --net=ipvlan210 --ip=192.168.212.10 -itd alpine /bin/sh
docker run --net=ipvlan210 --ip=192.168.212.9 -it --rm alpine ping -c 2 192.168.212.10
```
### Ipvlan 802.1q Trunk L2 Mode Example Usage
Architecturally, Ipvlan L2 mode trunking is the same as Macvlan with regard to gateways and L2 path isolation. There are nuances that can be advantageous for CAM table pressure in ToR switches, one MAC per port and MAC exhaustion on a host's parent NIC to name a few. The 802.1q trunk scenario looks the same. Both modes adhere to tagging standards and have seamless integration with the physical network for underlay integration and hardware vendor plugin integrations.
@ -356,7 +164,7 @@ $ ip route
192.168.30.0/24 dev eth0 src 192.168.30.2
```
Example: Multi-Subnet Ipvlan L2 Mode starting two containers on the same subnet and pinging one another. In order for the `192.168.114.0/24` to reach `192.168.116.0/24` it requires an external router in L2 mode. L3 mode can route between subnets that share a common `-o parent=`. This same multi-subnet example is also valid for Macvlan `bridge` mode.
Example: Multi-Subnet Ipvlan L2 Mode starting two containers on the same subnet and pinging one another. In order for the `192.168.114.0/24` to reach `192.168.116.0/24` it requires an external router in L2 mode. L3 mode can route between subnets that share a common `-o parent=`.
Secondary addresses on network routers are common as an address space becomes exhausted to add another secondary to a L3 vlan interface or commonly referred to as a "switched virtual interface" (SVI).
@ -393,13 +201,13 @@ IPVlan will require routes to be distributed to each endpoint. The driver only b
![Docker Ipvlan L2 Mode](images/ipvlan-l3.png)
Ipvlan L3 mode drops all broadcast and multicast traffic. This reason alone makes Ipvlan L3 mode a prime candidate for those looking for massive scale and predictable network integrations. It is predictable and in turn will lead to greater uptimes because there is no bridging involved. Bridging loops have been responsible for high profile outages that can be hard to pinpoint depending on the size of the failure domain. This is due to the cascading nature of BPDUs (Bridge Port Data Units) that are flooded throughout a broadcast domain (VLAN) to find and block topology loops. Eliminating bridging domains, or at the least, keeping them isolated to a pair of ToRs (top of rack switches) will reduce hard to troubleshoot bridging instabilities. Macvlan Bridge and Ipvlan L2 modes are well suited for isolated VLANs only trunked into a pair of ToRs that can provide a loop-free non-blocking fabric. The next step further is to route at the edge via Ipvlan L3 mode that reduces a failure domain to a local host only.
Ipvlan L3 mode drops all broadcast and multicast traffic. This reason alone makes Ipvlan L3 mode a prime candidate for those looking for massive scale and predictable network integrations. It is predictable and in turn will lead to greater uptimes because there is no bridging involved. Bridging loops have been responsible for high profile outages that can be hard to pinpoint depending on the size of the failure domain. This is due to the cascading nature of BPDUs (Bridge Port Data Units) that are flooded throughout a broadcast domain (VLAN) to find and block topology loops. Eliminating bridging domains, or at the least, keeping them isolated to a pair of ToRs (top of rack switches) will reduce hard to troubleshoot bridging instabilities. Ipvlan L2 modes is well suited for isolated VLANs only trunked into a pair of ToRs that can provide a loop-free non-blocking fabric. The next step further is to route at the edge via Ipvlan L3 mode that reduces a failure domain to a local host only.
- L3 mode needs to be on a separate subnet as the default namespace since it requires a netlink route in the default namespace pointing to the Ipvlan parent interface.
- The parent interface used in this example is `eth0` and it is on the subnet `192.168.1.0/24`. Notice the `docker network` is **not** on the same subnet as `eth0`.
- Unlike macvlan bridge mode and ipvlan l2 modes, different subnets/networks can ping one another as long as they share the same parent interface `-o parent=`.
- Unlike ipvlan l2 modes, different subnets/networks can ping one another as long as they share the same parent interface `-o parent=`.
```
ip a show eth0
@ -444,61 +252,6 @@ $ ip route
In order to ping the containers from a remote Docker host or the container be able to ping a remote host, the remote host or the physical network in between need to have a route pointing to the host IP address of the container's Docker host eth interface. More on this as we evolve the Ipvlan `L3` story.
### Dual Stack IPv4 IPv6 Macvlan Bridge Mode
**Example:** Macvlan Bridge mode, 802.1q trunk, VLAN ID: 218, Multi-Subnet, Dual Stack
```
# Create multiple bridge subnets with a gateway of x.x.x.1:
docker network create -d macvlan \
--subnet=192.168.216.0/24 --subnet=192.168.218.0/24 \
--gateway=192.168.216.1 --gateway=192.168.218.1 \
--subnet=2001:db8:abc8::/64 --gateway=2001:db8:abc8::10 \
-o parent=eth0.218 \
-o macvlan_mode=bridge macvlan216
# Start a container on the first subnet 192.168.216.0/24
docker run --net=macvlan216 --name=macnet216_test --ip=192.168.216.10 -itd alpine /bin/sh
# Start a container on the second subnet 192.168.218.0/24
docker run --net=macvlan216 --name=macnet216_test --ip=192.168.218.10 -itd alpine /bin/sh
# Ping the first container started on the 192.168.216.0/24 subnet
docker run --net=macvlan216 --ip=192.168.216.11 -it --rm alpine /bin/sh
ping 192.168.216.10
# Ping the first container started on the 192.168.218.0/24 subnet
docker run --net=macvlan216 --ip=192.168.218.11 -it --rm alpine /bin/sh
ping 192.168.218.10
```
View the details of one of the containers:
```
docker run --net=macvlan216 --ip=192.168.216.11 -it --rm alpine /bin/sh
root@526f3060d759:/# ip a show eth0
eth0@if92: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default
link/ether 8e:9a:99:25:b6:16 brd ff:ff:ff:ff:ff:ff
inet 192.168.216.11/24 scope global eth0
valid_lft forever preferred_lft forever
inet6 2001:db8:abc4::8c9a:99ff:fe25:b616/64 scope link tentative
valid_lft forever preferred_lft forever
inet6 2001:db8:abc8::2/64 scope link nodad
valid_lft forever preferred_lft forever
# Specified v4 gateway of 192.168.216.1
root@526f3060d759:/# ip route
default via 192.168.216.1 dev eth0
192.168.216.0/24 dev eth0 proto kernel scope link src 192.168.216.11
# Specified v6 gateway of 2001:db8:abc8::10
root@526f3060d759:/# ip -6 route
2001:db8:abc4::/64 dev eth0 proto kernel metric 256
2001:db8:abc8::/64 dev eth0 proto kernel metric 256
default via 2001:db8:abc8::10 dev eth0 metric 1024
```
### Dual Stack IPv4 IPv6 Ipvlan L2 Mode
- Not only does Libnetwork give you complete control over IPv4 addressing, but it also gives you total control over IPv6 addressing as well as feature parity between the two address families.
@ -602,13 +355,10 @@ Start a second container with a specific `--ip4` address and ping the first host
docker run --net=ipvlan140 --ip=192.168.140.10 -it --rm alpine /bin/sh
```
**Note**: Different subnets on the same parent interface in both Ipvlan `L2` mode and Macvlan `bridge` mode cannot ping one another. That requires a router to proxy-arp the requests with a secondary subnet. However, Ipvlan `L3` will route the unicast traffic between disparate subnets as long as they share the same `-o parent` parent link.
**Note**: Different subnets on the same parent interface in Ipvlan `L2` mode cannot ping one another. That requires a router to proxy-arp the requests with a secondary subnet. However, Ipvlan `L3` will route the unicast traffic between disparate subnets as long as they share the same `-o parent` parent link.
### Dual Stack IPv4 IPv6 Ipvlan L3 Mode
**Example:** IpVlan L3 Mode Dual Stack IPv4/IPv6, Multi-Subnet w/ 802.1q Vlan Tag:118
As in all of the examples, a tagged VLAN interface does not have to be used. The sub-interfaces can be swapped with `eth0`, `eth1`, `bond0` or any other valid interface on the host other then the `lo` loopback.