1
0
Fork 0
Commit Graph

8 Commits (e999b4f7262afe0d187c20d8fc567be14c669f67)

Author SHA1 Message Date
q3k a5ed644980 k0.hswaw.net: pass metallb through Calico
Previously, we had the following setup:

                          .-----------.
                          | .....     |
                        .-----------.-|
                        | dcr01s24  | |
                      .-----------.-| |
                      | dcr01s22  | | |
                  .---|-----------| |-'
    .--------.    |   |---------. | |
    | dcsw01 | <----- | metallb | |-'
    '--------'        |---------' |
                      '-----------'

Ie., each metallb on each node directly talked to dcsw01 over BGP to
announce ExternalIPs to our L3 fabric.

Now, we rejigger the configuration to instead have Calico's BIRD
instances talk BGP to dcsw01, and have metallb talk locally to Calico.

                      .-------------------------.
                      | dcr01s24                |
                      |-------------------------|
    .--------.        |---------.   .---------. |
    | dcsw01 | <----- | Calico  |<--| metallb | |
    '--------'        |---------'   '---------' |
                      '-------------------------'

This makes Calico announce our pod/service networks into our L3 fabric!

Calico and metallb talk to eachother over 127.0.0.1 (they both run with
Host Networking), but that requires one side to flip to pasive mode. We
chose to do that with Calico, by overriding its BIRD config and
special-casing any 127.0.0.1 peer to enable passive mode.

We also override Calico's Other Bird Template (bird_ipam.cfg) to fiddle
with the kernel programming filter (ie. to-kernel-routing-table filter),
where we disable programming unreachable routes. This is because routes
coming from metallb have their next-hop set to 127.0.0.1, which makes
bird mark them as unreachable. Unreachable routes in the kernel will
break local access to ExternalIPs, eg. register access from containerd.

All routes pass through without route reflectors and a full mesh as we
use eBGP over private ASNs in our fabric.

We also have to make Calico aware of metallb pools - otherwise, routes
announced by metallb end up being filtered by Calico.

This is all mildly hacky. Here's hoping that Calico will be able to some
day gain metallb-like functionality, ie. IPAM for
externalIPs/LoadBalancers/...

There seems to be however one problem with this change (but I'm not
fixing it yet as it's not critical): metallb would previously only
announce IPs from nodes that were serving that service. Now, however,
the Calico internal mesh makes those appear from every node. This can
probably be fixed by disabling local meshing, enabling route reflection
on dcsw01 (to recreate the mesh routing through dcsw01). Or, maybe by
some more hacking of the Calico BIRD config :/.

Change-Id: I3df1f6ae7fa1911dd53956ced3b073581ef0e836
2020-09-23 18:55:12 +00:00
q3k e55493f635 calico: fix access to resources from controller
This fixes even more networking issues.

Change-Id: I754656a01e3de8a34055280908b343a1a25a4707
2020-05-30 17:57:05 +02:00
q3k ba375e62b2 calico: fix node name selection
This was an attempt to make new calico nodes use a full FQDN. However,
this change seemingly also makes the calico control plane use the FQDN
for all existing nodes, as such breaking CNI for new pods.

We revert this change, thereby keeping all calico nodes names as
hostnames. We could fix this by editing /var/lib/calico/nodename on
hosts to FQDNs, but it might not be worth the effort.

See https://github.com/projectcalico/calico/issues/1093 for more
context.

Change-Id: I52bfb00f604053d57d3009aebd6c50db7dc74f58
2020-05-30 16:18:13 +02:00
q3k d81bf72d7f calico: upgrade to 3.14, fix calicoctl
We still use etcd as the data store (and as such didn't set up k8s CRDs
for Calico), but that's okay for now.

Change-Id: If6d66f505c6b40f2646ffae7d33d0d641d34a963
2020-05-28 16:47:16 +02:00
q3k d493ab66ca *: add dcr01s{22,24}
Change-Id: I072e825e2e1d199d9da50b9d38a9ffba68e61182
2019-10-31 17:07:50 +01:00
q3k 73cef11c85 *: rejigger tls certs and more
This pretty large change does the following:

 - moves nix from bootstrap.hswaw.net to nix/
 - changes clustercfg to use cfssl and moves it to cluster/clustercfg
 - changes clustercfg to source information about target location of
   certs from nix
 - changes clustercfg to push nix config
 - changes tls certs to have more than one CA
 - recalculates all TLS certs
   (it keeps the old serviceaccoutns key, otherwise we end up with
   invalid serviceaccounts - the cert doesn't match, but who cares,
   it's not used anyway)
2019-04-07 00:06:23 +02:00
q3k e3af1eb852 cluster: autodetect IP address
This is so that Calico starts with the proper subnet. Feeding it just an
IP from the node status will mean it parses it as /32 and uses IPIP
tunnels for all connectivity.
2019-01-18 09:39:57 +01:00
q3k af3be426ad cluster: deploy calico and metrics service 2019-01-17 18:57:19 +01:00