Some pods got evicted. Some of them broke.
- postgres in matrix and nginx in internet because of the new policies
(chown issues)
- cas proxy in matrix because apparently the image was not reuploaded
to the regsitry after ceph-waw1 died, and another node didn't have it
- registry because it had a weak image pin an downgraded to some
broken version on another node
Change-Id: I836036872629843c8ede1b7f67982112c90d71f0
Here we introduce benji [1], a backup system based on backy2. It lets us
backup Ceph RBD objects from Rook into Wasabi, our offsite S3-compatible
storage provider.
Benji runs as a k8s CronJob, every hour at 42 minutes. It does the
following:
- runs benji-pvc-backup, which iterates over all PVCs in k8s, and backs
up their respective PVs to Wasabi
- runs benji enforce, marking backups outside our backup policy [2] as
to be deleted
- runs benji cleanup, to remove unneeded backups
- runs a custom script to backup benji's sqlite3 database into wasabi
(unencrypted, but we're fine with that - as the metadata only contains
image/pool names, thus Ceph PV and pool names)
[1] - https://benji-backup.me/index.html
[2] - latest3,hours48,days7,months12, which means the latest 3 backups,
then one backup for the next 48 hours, then one backup for the next
7 days, then one backup for the next 12 months, for a total of 65
backups (deduplicated, of course)
We also drive-by update some docs (make them mmore separated into
user/admin docs).
Change-Id: Ibe0942fd38bc232399c0e1eaddade3f4c98bc6b4
Prodaccess/Prodvider allow issuing short-lived certificates for all SSO
users to access the kubernetes cluster.
Currently, all users get a personal-$username namespace in which they
have adminitrative rights. Otherwise, they get no access.
In addition, we define a static CRB to allow some admins access to
everything. In the future, this will be more granular.
We also update relevant documentation.
Change-Id: Ia18594eea8a9e5efbb3e9a25a04a28bbd6a42153
We accidentally created crdb-waw2 in
https://gerrit.hackerspace.pl/c/hscloud/+/2.
We remove it now and also backport a manual change that makes the
crdb-waw1 service public via a LoadBalancer.
Change-Id: I3bbd6f01b82c6efa458cc44776f086ba36e9f20c
This makes a registry be automatically part of the cluster
infrastructure.
Tested by running kubecfg diff, no diffs (apart from out-of-date ACLs)
found.
Change-Id: Ic0635e789cf3fb851f410bcf2865326f1fa87545
IP addresses are not necessary in the topology definitions of a
cockroach cluster.
They were mis-commited leftovers from trying to run the cluster on
DaemonSets with hostNetworking: true.
Change-Id: I4ef1f6ed9a745efc6b05846bc13aba9d1f8dc7c8