1
0
Fork 0
Commit Graph

16 Commits (42b21ecd84b713faf1c65c4ddceb74f559fe94bd)

Author SHA1 Message Date
q3k e77f7717d4 k0: bump to 1.16.5
Change-Id: I548808ce4e0deb0513a1e00963f383d84b9d920c
2020-10-10 22:39:50 +02:00
q3k 1257389d3d k0: expose controller-manager and scheduler metrics
We want to be able to scrape controller-manager and scheduler metrics
into Prometheus. For that, each of them needs to:

 1) listen on a secure port
 2) have authn enabled

With this, any k8s user with the right permissions (and a bearer token
or TLS certificate) can come in and access metrics over a node's public
IP address. Access without a certificate/token gets thrown into the
system:anonymous user, which as no access to any API.

Change-Id: I267680f92f748ba63b6762e6aaba3c417446e50b
2020-10-10 16:00:15 +00:00
q3k 36224c617a clustercfg: show diff before switching to new configuration
This is mildly hacky, but lets us be more informed before we switch to a
new configuration.

Change-Id: I008f3f698db702f1e0992bd41a8d1050449d59b5
2020-10-10 16:00:11 +00:00
q3k 2e001e5046 k0: bump to 1.15.4
This notably fixes the annoying loopback issues that prevented hosts
from accessing externalip services with externalTrafficPolicy: local
from nodes that weren't running the service.

Which means, hopefuly, no more registry pull failures when
nginx-ingress gets misplaced!

Change-Id: Id4923fd0fce2e28c31a1e65518b0e984165ca9ec
2020-10-03 16:32:38 +00:00
q3k fbe234bdb2 cluster: rename module-* into modules/*
Change-Id: I65e06f3e9cec2ba0071259eb755eddbbd1025b97
2020-10-03 14:57:30 +00:00
q3k 316411790a cluster/nix: update nodes
- we update NixOS to 20.09pre
 - we fix an ACME option that's now required
 - we switch from systemd-timesyncd to chrony (as timesyncd took a long
   time to sync clocks after restart, leading to MON_CLOCK_SKEW errors
   from ceph)

This has been deployed in production.

Change-Id: Ibfcd41567235bae3e3d8abeeed61f4694ae614ad
2020-08-23 00:58:29 +02:00
q3k d5918c8e72 cluster: change q3k's laptop key
Paranoia is dead, long live Mimeomia.

This has already been deployed to production.

Change-Id: Ibbc5015b5277380a3450f76e62d3fab6e71be1a0
2020-08-22 22:29:42 +02:00
q3k d7364520e9 cluster: bump kubelets to 1.14.3
Change-Id: I02ed978a49629cdfc3f3587ad640e8cc5a5fad23
2020-02-02 23:43:28 +01:00
q3k e2095b2ce9 cluster: remove unused module-cluster.nix
Change-Id: I819d803fc7454cfd63a11a109ec73c9578f598b8
2020-02-02 23:43:00 +01:00
q3k c78cc13528 cluster/nix: locally build nixos derivations
We change the existing behaviour (copy files & run nixos-rebuild switch)
to something closer to nixops-style. This now means that provisioning
admin machines need Nix installed locally, but that's probably an okay
choice to make.

The upside of this approach is that it's easier to debug and test
derivations, as all data is local to the repo and the workstation, and
deploying just means copying a configuration closure and switching the
system to it. At some point we should even be able to run the entire
cluster within a set of test VMs.

We also bump the kubernetes control plane to 1.14. Kubelets are still at
1.13 and their upgrade is comint up today too.

Change-Id: Ia9832c47f258ee223d93893d27946d1161cc4bbd
2020-02-02 22:31:53 +01:00
q3k 050af01b83 cluster: add q3k's new SSH key
Change-Id: I872a75cc89a62c9487433fa5e8e5767953e309c9
2019-12-17 01:58:58 +01:00
q3k d493ab66ca *: add dcr01s{22,24}
Change-Id: I072e825e2e1d199d9da50b9d38a9ffba68e61182
2019-10-31 17:07:50 +01:00
q3k 42553cd044 cluster: disable unauthenticated read only port on kubelets
This port was leaking kubelet state, including information on running
pods. No secrets were leaked (if they were not text-pasted into
env/args), but this still shouldn't be available.

As far as I can tell, nothing depends on this port, other than some
enterprise load balancers that require HTTP for node 'health' checks.

Change-Id: I9549b73e0168fe3ea4dce43cbe8fdc2ca4575961
2019-09-02 16:33:02 +02:00
q3k b13b7ffcdb prod{access,vider}: implement
Prodaccess/Prodvider allow issuing short-lived certificates for all SSO
users to access the kubernetes cluster.

Currently, all users get a personal-$username namespace in which they
have adminitrative rights. Otherwise, they get no access.

In addition, we define a static CRB to allow some admins access to
everything. In the future, this will be more granular.

We also update relevant documentation.

Change-Id: Ia18594eea8a9e5efbb3e9a25a04a28bbd6a42153
2019-08-30 23:08:18 +02:00
q3k d07861b7df ceph-waw1 -> ceph-waw2
Change-Id: I03d6244b9697a9efc06492114ef90cdb01e17601
2019-08-08 17:49:31 +02:00
q3k 116da981c9 nix/ -> cluster/nix/
These are related to cluster bootstrapping, not generic language
libraries (like go/ and bzl/).

Change-Id: I03a83c64f3e0fa6cb615d36b4e618f5e92d886ec
2019-07-21 15:53:20 +02:00