This is in prepration for bringing up a Matrix server for hsp.sh.
Verified to cause no diff on prod.
Change-Id: Ied2de210692e3ddfdb1d3f37b12893b214c34b0b
This is an amalgamation of a handful of small changes to Go deps.
Notably:
- we remove our opencensus-proto fork, use upstream, use exclude=src to
fix the build
- unvendorify some deps
- bump io_rules_go to fix WKT resolution
Notably, we now do not have the 'protoc-gen-go' error when running
kubecfg/kubectl anymore.
Change-Id: I34fb9e78b2b12e4543142183d601d01987076f32
This adds Bazel/hscloud integration to gostatic, via gostatic_tarball.
A sample is provided in //tools/gostatic/example, it can be built using:
bazel build //tools/gostatic/example
The resulting tarball can then be extracted and viewed in a web
browser.
Change-Id: Idf8d4a8e0ee3a5ae07f7449a25909478c2d8b105
A customer was missing a static v6 route via their router. Since we
don't want to add them to networking.interfaces.routes.* (as this
restarts the whole scripted network stack in NixOS), we add them to
bird. This requires implementing hscloud.routing.static.
Change-Id: I0a205ed1e1f17a86de43aaf72ab6c2694a069112
If set, this enables internal redis authentication scheme. Supports
secretRefs, as well as values passed directly.
Change-Id: Ie902b8d79fdc4aa83ad8ad123e79f0bc80c1251f
We want to be able to scrape controller-manager and scheduler metrics
into Prometheus. For that, each of them needs to:
1) listen on a secure port
2) have authn enabled
With this, any k8s user with the right permissions (and a bearer token
or TLS certificate) can come in and access metrics over a node's public
IP address. Access without a certificate/token gets thrown into the
system:anonymous user, which as no access to any API.
Change-Id: I267680f92f748ba63b6762e6aaba3c417446e50b
This allows us to use rules_docker from NixOS. However, the built
binaries are broken because of the Docker base image not being NixOS
based.
Change-Id: I29b93f1bae1575b04f97265c67497081d11a1910
This notably fixes the annoying loopback issues that prevented hosts
from accessing externalip services with externalTrafficPolicy: local
from nodes that weren't running the service.
Which means, hopefuly, no more registry pull failures when
nginx-ingress gets misplaced!
Change-Id: Id4923fd0fce2e28c31a1e65518b0e984165ca9ec
This has been deployed to k0 nodes.
Current state of cluster certificates:
cluster/certs/ca-etcd.crt
Not After : Apr 4 17:59:00 2024 GMT
cluster/certs/ca-etcdpeer.crt
Not After : Apr 4 17:59:00 2024 GMT
cluster/certs/ca-kube.crt
Not After : Apr 4 17:59:00 2024 GMT
cluster/certs/ca-kubefront.crt
Not After : Apr 4 17:59:00 2024 GMT
cluster/certs/ca-kube-prodvider.cert
Not After : Sep 1 21:30:00 2021 GMT
cluster/certs/etcd-bc01n01.hswaw.net.cert
Not After : Mar 28 15:53:00 2021 GMT
cluster/certs/etcd-bc01n02.hswaw.net.cert
Not After : Mar 28 16:45:00 2021 GMT
cluster/certs/etcd-bc01n03.hswaw.net.cert
Not After : Mar 28 15:15:00 2021 GMT
cluster/certs/etcd-calico.cert
Not After : Mar 28 15:15:00 2021 GMT
cluster/certs/etcd-dcr01s22.hswaw.net.cert
Not After : Oct 3 15:33:00 2021 GMT
cluster/certs/etcd-dcr01s24.hswaw.net.cert
Not After : Oct 3 15:38:00 2021 GMT
cluster/certs/etcd-kube.cert
Not After : Mar 28 15:15:00 2021 GMT
cluster/certs/etcdpeer-bc01n01.hswaw.net.cert
Not After : Mar 28 15:53:00 2021 GMT
cluster/certs/etcdpeer-bc01n02.hswaw.net.cert
Not After : Mar 28 16:45:00 2021 GMT
cluster/certs/etcdpeer-bc01n03.hswaw.net.cert
Not After : Mar 28 15:15:00 2021 GMT
cluster/certs/etcdpeer-dcr01s22.hswaw.net.cert
Not After : Oct 3 15:33:00 2021 GMT
cluster/certs/etcdpeer-dcr01s24.hswaw.net.cert
Not After : Oct 3 15:38:00 2021 GMT
cluster/certs/etcd-root.cert
Not After : Mar 28 15:15:00 2021 GMT
cluster/certs/kube-apiserver.cert
Not After : Oct 3 15:26:00 2021 GMT
cluster/certs/kube-controllermanager.cert
Not After : Mar 28 15:15:00 2021 GMT
cluster/certs/kubefront-apiserver.cert
Not After : Mar 28 15:15:00 2021 GMT
cluster/certs/kube-kubelet-bc01n01.hswaw.net.cert
Not After : Mar 28 15:53:00 2021 GMT
cluster/certs/kube-kubelet-bc01n02.hswaw.net.cert
Not After : Mar 28 16:45:00 2021 GMT
cluster/certs/kube-kubelet-bc01n03.hswaw.net.cert
Not After : Mar 28 15:15:00 2021 GMT
cluster/certs/kube-kubelet-dcr01s22.hswaw.net.cert
Not After : Oct 3 15:33:00 2021 GMT
cluster/certs/kube-kubelet-dcr01s24.hswaw.net.cert
Not After : Oct 3 15:38:00 2021 GMT
cluster/certs/kube-proxy.cert
Not After : Mar 28 15:15:00 2021 GMT
cluster/certs/kube-scheduler.cert
Not After : Mar 28 15:15:00 2021 GMT
cluster/certs/kube-serviceaccounts.cert
Not After : Mar 28 15:15:00 2021 GMT
Change-Id: I94030ce78c10f7e9a0c0257d55145ef629195314
This introduces Nix, the package manager, and nixpkgs, the package
collection, into hscloud's bazel build machinery.
There are two reasons behind this:
- on NixOS, it's painful or at least very difficult to run hscloud out
of the box. Especially with rules_go, that download a blob from the
Internet to get a Go toolchain, it just fails outright. This solves
this and allows hscloud to be used on NixOS.
- on non-NixOS platforms that still might have access to Nix this
allows to somewhat hermeticize the build. Notably, Python now comes
from nixpkgs, and is fabricobbled in a way that makes pip3_import
use Nix system dependencies for ncurses and libpq.
This has been tested to run ci_presubmit on NixOS 20.09pre and Gentoo
~amd64.
Change-Id: Ic16e4827cb52a05aea0df0eed84d80c5e9ae0e07
This makes all Nix files addressable from root by file path.
For instance, if a file is located in //foo/bar:baz.nix containing:
{ pkgs, ... }:
pkgs.stdenv.mkDerivation {
pname = "foo";
# ...
}
You can then do:
nix-build -A foo.bar.baz
All nix files loaded this way must be a function taking a 'config'
attrset - see nix/readTree.nix for more information. Currently the
config attrset contains the following fields:
- hscloud: the root of the hscloud repository itself, which allows
for traversal via readTree (eg. hscloud.foo.bar.baz)
- pkgs: nixpkgs
- pkgsSrc: nixpkgs souce/channel, useful to load NixOS modules.
- lib, stdenv: lib and stdenv from pkgs.
Change-Id: Ieaacdcabceec18dd6c670d346928bff08b66cf79
This prevents metallb routes being announced from all peers to our ToR,
thereby preventing issues with traffic hitting services with
externalTrafficPolicy: local.
There still is the from-host loopback issue, but that will be fixed by
upgrading to kube 1.15.
Change-Id: Ifc9964b46840aee82d99f0b6550188550e46fe04
This fixes compatibility with prodaccess tools built with Go 1.15, which
introduced 'X.509 CommonName deprecation' [1].
[1] - https://golang.org/doc/go1.15#commonname
Change-Id: I228cde3e5651a3e36f527783f2ccb4a2f6b7a8e3
This will be, at some point, a script to run on Gerrit presubmit (ie.
right before merge).
For now, you can manually run it to ensure that Everything At Least
Kinda Works.
Change-Id: I28b305fa81a4ca4a8e94ce4daa06fe9ae0184fe8
Previously, we had the following setup:
.-----------.
| ..... |
.-----------.-|
| dcr01s24 | |
.-----------.-| |
| dcr01s22 | | |
.---|-----------| |-'
.--------. | |---------. | |
| dcsw01 | <----- | metallb | |-'
'--------' |---------' |
'-----------'
Ie., each metallb on each node directly talked to dcsw01 over BGP to
announce ExternalIPs to our L3 fabric.
Now, we rejigger the configuration to instead have Calico's BIRD
instances talk BGP to dcsw01, and have metallb talk locally to Calico.
.-------------------------.
| dcr01s24 |
|-------------------------|
.--------. |---------. .---------. |
| dcsw01 | <----- | Calico |<--| metallb | |
'--------' |---------' '---------' |
'-------------------------'
This makes Calico announce our pod/service networks into our L3 fabric!
Calico and metallb talk to eachother over 127.0.0.1 (they both run with
Host Networking), but that requires one side to flip to pasive mode. We
chose to do that with Calico, by overriding its BIRD config and
special-casing any 127.0.0.1 peer to enable passive mode.
We also override Calico's Other Bird Template (bird_ipam.cfg) to fiddle
with the kernel programming filter (ie. to-kernel-routing-table filter),
where we disable programming unreachable routes. This is because routes
coming from metallb have their next-hop set to 127.0.0.1, which makes
bird mark them as unreachable. Unreachable routes in the kernel will
break local access to ExternalIPs, eg. register access from containerd.
All routes pass through without route reflectors and a full mesh as we
use eBGP over private ASNs in our fabric.
We also have to make Calico aware of metallb pools - otherwise, routes
announced by metallb end up being filtered by Calico.
This is all mildly hacky. Here's hoping that Calico will be able to some
day gain metallb-like functionality, ie. IPAM for
externalIPs/LoadBalancers/...
There seems to be however one problem with this change (but I'm not
fixing it yet as it's not critical): metallb would previously only
announce IPs from nodes that were serving that service. Now, however,
the Calico internal mesh makes those appear from every node. This can
probably be fixed by disabling local meshing, enabling route reflection
on dcsw01 (to recreate the mesh routing through dcsw01). Or, maybe by
some more hacking of the Calico BIRD config :/.
Change-Id: I3df1f6ae7fa1911dd53956ced3b073581ef0e836