hscloud

cheshire

hscloud

Author	SHA1	Message	Date
q3k	a5ed644980	k0.hswaw.net: pass metallb through Calico Previously, we had the following setup: .-----------. \| ..... \| .-----------.-\| \| dcr01s24 \| \| .-----------.-\| \| \| dcr01s22 \| \| \| .---\|-----------\| \|-' .--------. \| \|---------. \| \| \| dcsw01 \| <----- \| metallb \| \|-' '--------' \|---------' \| '-----------' Ie., each metallb on each node directly talked to dcsw01 over BGP to announce ExternalIPs to our L3 fabric. Now, we rejigger the configuration to instead have Calico's BIRD instances talk BGP to dcsw01, and have metallb talk locally to Calico. .-------------------------. \| dcr01s24 \| \|-------------------------\| .--------. \|---------. .---------. \| \| dcsw01 \| <----- \| Calico \|<--\| metallb \| \| '--------' \|---------' '---------' \| '-------------------------' This makes Calico announce our pod/service networks into our L3 fabric! Calico and metallb talk to eachother over 127.0.0.1 (they both run with Host Networking), but that requires one side to flip to pasive mode. We chose to do that with Calico, by overriding its BIRD config and special-casing any 127.0.0.1 peer to enable passive mode. We also override Calico's Other Bird Template (bird_ipam.cfg) to fiddle with the kernel programming filter (ie. to-kernel-routing-table filter), where we disable programming unreachable routes. This is because routes coming from metallb have their next-hop set to 127.0.0.1, which makes bird mark them as unreachable. Unreachable routes in the kernel will break local access to ExternalIPs, eg. register access from containerd. All routes pass through without route reflectors and a full mesh as we use eBGP over private ASNs in our fabric. We also have to make Calico aware of metallb pools - otherwise, routes announced by metallb end up being filtered by Calico. This is all mildly hacky. Here's hoping that Calico will be able to some day gain metallb-like functionality, ie. IPAM for externalIPs/LoadBalancers/... There seems to be however one problem with this change (but I'm not fixing it yet as it's not critical): metallb would previously only announce IPs from nodes that were serving that service. Now, however, the Calico internal mesh makes those appear from every node. This can probably be fixed by disabling local meshing, enabling route reflection on dcsw01 (to recreate the mesh routing through dcsw01). Or, maybe by some more hacking of the Calico BIRD config :/. Change-Id: I3df1f6ae7fa1911dd53956ced3b073581ef0e836	2020-09-23 18:55:12 +00:00
q3k	e55493f635	calico: fix access to resources from controller This fixes even more networking issues. Change-Id: I754656a01e3de8a34055280908b343a1a25a4707	2020-05-30 17:57:05 +02:00
q3k	ba375e62b2	calico: fix node name selection This was an attempt to make new calico nodes use a full FQDN. However, this change seemingly also makes the calico control plane use the FQDN for all existing nodes, as such breaking CNI for new pods. We revert this change, thereby keeping all calico nodes names as hostnames. We could fix this by editing /var/lib/calico/nodename on hosts to FQDNs, but it might not be worth the effort. See https://github.com/projectcalico/calico/issues/1093 for more context. Change-Id: I52bfb00f604053d57d3009aebd6c50db7dc74f58	2020-05-30 16:18:13 +02:00
q3k	d81bf72d7f	calico: upgrade to 3.14, fix calicoctl We still use etcd as the data store (and as such didn't set up k8s CRDs for Calico), but that's okay for now. Change-Id: If6d66f505c6b40f2646ffae7d33d0d641d34a963	2020-05-28 16:47:16 +02:00
q3k	d493ab66ca	*: add dcr01s{22,24} Change-Id: I072e825e2e1d199d9da50b9d38a9ffba68e61182	2019-10-31 17:07:50 +01:00
q3k	73cef11c85	*: rejigger tls certs and more This pretty large change does the following: - moves nix from bootstrap.hswaw.net to nix/ - changes clustercfg to use cfssl and moves it to cluster/clustercfg - changes clustercfg to source information about target location of certs from nix - changes clustercfg to push nix config - changes tls certs to have more than one CA - recalculates all TLS certs (it keeps the old serviceaccoutns key, otherwise we end up with invalid serviceaccounts - the cert doesn't match, but who cares, it's not used anyway)	2019-04-07 00:06:23 +02:00
q3k	e3af1eb852	cluster: autodetect IP address This is so that Calico starts with the proper subnet. Feeding it just an IP from the node status will mean it parses it as /32 and uses IPIP tunnels for all connectivity.	2019-01-18 09:39:57 +01:00
q3k	af3be426ad	cluster: deploy calico and metrics service	2019-01-17 18:57:19 +01:00

8 Commits (e999b4f7262afe0d187c20d8fc567be14c669f67)