hscloud

cheshire

hscloud

Author	SHA1	Message	Date
q3k	a01c487a6e	cluster: allow insecure pods in rook-ceph-system This is required for the agent to start a socket on each host for kubelet-to-rook access. Change-Id: I78529df81185aeaacdcb494138f72f0224a029c6	2019-09-05 16:01:19 +00:00
q3k	13bb1bf4e3	Get in the Cluster, Benji! Here we introduce benji [1], a backup system based on backy2. It lets us backup Ceph RBD objects from Rook into Wasabi, our offsite S3-compatible storage provider. Benji runs as a k8s CronJob, every hour at 42 minutes. It does the following: - runs benji-pvc-backup, which iterates over all PVCs in k8s, and backs up their respective PVs to Wasabi - runs benji enforce, marking backups outside our backup policy [2] as to be deleted - runs benji cleanup, to remove unneeded backups - runs a custom script to backup benji's sqlite3 database into wasabi (unencrypted, but we're fine with that - as the metadata only contains image/pool names, thus Ceph PV and pool names) [1] - https://benji-backup.me/index.html [2] - latest3,hours48,days7,months12, which means the latest 3 backups, then one backup for the next 48 hours, then one backup for the next 7 days, then one backup for the next 12 months, for a total of 65 backups (deduplicated, of course) We also drive-by update some docs (make them mmore separated into user/admin docs). Change-Id: Ibe0942fd38bc232399c0e1eaddade3f4c98bc6b4	2019-09-02 16:33:02 +02:00
q3k	9496d9910a	cluster: add nextcloud user for object store Change-Id: Ib08be16f71ff5e1b72ca6ad436de4b12427dd407	2019-09-02 16:33:02 +02:00
q3k	896926c921	prodvider: clean up LDAP connections Change-Id: Ic95e6d1b845832fa0fb2da51b418bcdcb8fd05c4	2019-08-31 15:00:51 +02:00
q3k	71a21c7693	rook/ceph: bump Change-Id: I046df292cad11650adb829cc8a73100cc1d1ecc8	2019-08-30 23:08:26 +02:00
q3k	b13b7ffcdb	prod{access,vider}: implement Prodaccess/Prodvider allow issuing short-lived certificates for all SSO users to access the kubernetes cluster. Currently, all users get a personal-$username namespace in which they have adminitrative rights. Otherwise, they get no access. In addition, we define a static CRB to allow some admins access to everything. In the future, this will be more granular. We also update relevant documentation. Change-Id: Ia18594eea8a9e5efbb3e9a25a04a28bbd6a42153	2019-08-30 23:08:18 +02:00
q3k	d16454badc	cert-manager: bump to v0.9.1 We just got this email: We've been working with Jetstack, the authors of cert-manager, on a series of fixes to the client. Cert-manager sometimes falls into a traffic pattern where it sends really excessive traffic to Let's Encrypt's servers, continuously. To mitigate this, we plan to start blocking all traffic from cert-manager versions less than 0.8.0 (the current semver minor release), as of November 1, 2019. Please upgrade all of your cert-manager instances before then. We're sending this email because this is the contact address of your cert-manager instance at: 185.236.240.37 . Version 0.8.0 is much better but we still observe excessive traffic in some cases. We're working with Jetstack to improve these cases. As new versions of cert-manager are released, we will add the non-current versions to our block list after 3 months. We strongly encourage cert-manager users to stay up-to-date with new versions. Also, there is an opportunity to help both Jetstack and Let's Encrypt. Once you've upgraded, please check the logs for your cert-manager instances from time to time. Are they making excessive requests to Let's Encrypt (more than, say, 10 per day over multiple days)? If so, please share details at https://github.com/jetstack/cert-manager/issues/1948 . Thanks, Let's Encrypt Team Change-Id: Ic7152150ac1c96941423878c6d4b6209e07429cf	2019-08-29 17:21:49 +02:00
q3k	1fad2e5c6e	bgpwtf/cccampix: draw the rest of the fucking owl Change-Id: I49fd5906e69512e8f2d414f406edc0179522f225	2019-08-11 23:43:25 +02:00
q3k	d533892efa	Fix crdb-waw1 We accidentally created crdb-waw2 in https://gerrit.hackerspace.pl/c/hscloud/+/2. We remove it now and also backport a manual change that makes the crdb-waw1 service public via a LoadBalancer. Change-Id: I3bbd6f01b82c6efa458cc44776f086ba36e9f20c	2019-08-11 23:42:47 +02:00
q3k	d07861b7df	ceph-waw1 -> ceph-waw2 Change-Id: I03d6244b9697a9efc06492114ef90cdb01e17601	2019-08-08 17:49:31 +02:00
q3k	4d61d20aec	app/registry: integrate into cluster/kube This makes a registry be automatically part of the cluster infrastructure. Tested by running kubecfg diff, no diffs (apart from out-of-date ACLs) found. Change-Id: Ic0635e789cf3fb851f410bcf2865326f1fa87545	2019-07-21 16:56:41 +02:00
q3k	92be486f39	Revert "cluster/kube/lib/nginx: use Local traffic policy" This reverts commit `09a0f06d2a`. Reason for revert: prevents registry from being accessible on nodes: q3k@anathema ~/Software/hscloud $ curl registry.k0.hswaw.net <html> [..., ok] [root@bc01n03:~]# curl registry.k0.hswaw.net ^C Change-Id: I0da97aaf7a8791ea3f62c70b6c1502f4a48a300f	2019-06-29 22:58:19 +00:00
q3k	09a0f06d2a	cluster/kube/lib/nginx: use Local traffic policy Diff against prod: - live services nginx-system.ingress-nginx + config services nginx-system.ingress-nginx { "apiVersion": "v1", "kind": "Service", "metadata": { "annotations": {}, "labels": { "app.kubernetes.io/name": "ingress-nginx", "app.kubernetes.io/part-of": "ingress-nginx" }, "name": "ingress-nginx", "namespace": "nginx-system" }, "spec": { - "externalTrafficPolicy": "Cluster", + "externalTrafficPolicy": "Local", "ports": [ { "name": "ssh", "port": 22, "protocol": "TCP", "targetPort": 22 }, { "name": "http", "port": 80, "protocol": "TCP", "targetPort": 80 }, { "name": "https", "port": 443, "protocol": "TCP", "targetPort": 443 } ], "selector": { "app.kubernetes.io/name": "ingress-nginx", "app.kubernetes.io/part-of": "ingress-nginx" }, "type": "LoadBalancer" } } Change-Id: I0dd66e3f1643efa975d6180cc163a265d4b484ef	2019-06-29 22:44:53 +02:00
q3k	543b412a65	cluster/kube/lib/nginx: add gerrit forwarding This is already running in production since gerrit was deployed - it just got lost during submit. Change-Id: I8a1580b1ca3ec3142a8fa4320dc9f51a599a914f	2019-06-29 22:42:39 +02:00
q3k	184678b0f4	cluster/cube/lib/cockroachdb: clean up topology IP addresses are not necessary in the topology definitions of a cockroach cluster. They were mis-commited leftovers from trying to run the cluster on DaemonSets with hostNetworking: true. Change-Id: I4ef1f6ed9a745efc6b05846bc13aba9d1f8dc7c8	2019-06-22 21:18:29 +00:00
q3k	dec401c7dd	cluster/kube/lib/cockroach: move client to deployment This prevents a bug where kubecfg fails to update the client pod when running a cluster/kube/cluster.jsonnet update. The pod update is attempted because of runtime/intent differences at serviceAccounts specification, which causes kubecfg to see a diff, which causes it to attempt and update, which causes kube-apiserver to reject the change (because pods are immutable), which causes kubecfg to fail. Change-Id: I20b0ecbb264213a2eb483d475c7683b4965c82be	2019-06-22 23:14:25 +02:00
q3k	c7258f4644	cluster/kube: refactor, add crdb-waw1	2019-06-21 00:24:09 +02:00
q3k	e53e39a8be	cluster/kube/lib/cockroachdb: use manual node pinning We move away from the StatefulSet based deployment to manually starting a deployment per intended node. This allows us to pin indivisual instances of Cockroach to particular nodes, so that they state co-located with their data.	2019-06-20 23:36:35 +02:00
q3k	662a3cdcca	cluster/kube/lib/cockroachdb: refactor We refactor this library to: - support multiple databases, but with a strong suggestion of having one per k8s cluster - drop the database creation logic - redo naming (allowing for two options: multiple clusters per namespace or an exclusive namespace for the cluster) - unhardcode dns names	2019-06-20 19:45:03 +02:00
q3k	224a50bbfe	cluster/kube/lib/cockroach: fix imports	2019-06-20 16:43:01 +02:00
q3k	3c117fa841	make cockroachdb into a cluster service	2019-06-20 16:43:01 +02:00
q3k	c3b0f7627c	cluster/kube: set operator replicas to 0	2019-06-20 16:42:19 +02:00
q3k	36cc4fb61a	bazel-cache: deploy, add waw-hdd-yolo-1 ceph pool	2019-05-17 18:09:39 +02:00
informatic	fc514a9b52	cluster/kube/cert-manager: don't add APIService when webhooks are disabled	2019-05-05 12:12:13 +02:00
informatic	b187bf5b2c	cluster/kube/metallb: downgrade to 0.7.3	2019-05-05 12:11:14 +02:00
q3k	321fad9865	cluster/kube/rook: lower debug	2019-04-19 14:14:36 +02:00
q3k	ed2e670c8b	cluster/kube/rook: bump to ceph v14 fully	2019-04-19 13:27:20 +02:00
informatic	5ac85c6e73	cluster/kube: refactor rook.io object store configuration	2019-04-09 21:45:32 +02:00
informatic	6da3b288dc	WIP: app/registry: ceph object storage	2019-04-09 13:48:21 +02:00
q3k	73cef11c85	*: rejigger tls certs and more This pretty large change does the following: - moves nix from bootstrap.hswaw.net to nix/ - changes clustercfg to use cfssl and moves it to cluster/clustercfg - changes clustercfg to source information about target location of certs from nix - changes clustercfg to push nix config - changes tls certs to have more than one CA - recalculates all TLS certs (it keeps the old serviceaccoutns key, otherwise we end up with invalid serviceaccounts - the cert doesn't match, but who cares, it's not used anyway)	2019-04-07 00:06:23 +02:00
q3k	242152f65e	cluster/kube/lib/metallb: bump memory hoping to prevent crashes	2019-04-04 16:54:00 +02:00
informatic	3187c59a86	cluster/kube: ceph dashboard tls certificates	2019-04-02 14:44:04 +02:00
informatic	2afe604595	cluster/kube: minor cert-manager cleanups, disable webhooks by default	2019-04-02 14:43:34 +02:00
informatic	79ddbc57d9	cluster/kube: initial cert-manager implementation	2019-04-02 13:20:15 +02:00
q3k	65f3b1d8ab	cluster/kube: add waw-hdd-redundant-1 pool/storageclass	2019-04-02 01:05:38 +02:00
q3k	c6da127d3f	cluster/kube: ceph-waw1 up	2019-04-02 00:06:13 +02:00
q3k	cdfafaf91e	cluster/kube: finish rook operator	2019-04-01 19:16:18 +02:00
q3k	b7fcc67f42	cluster/kube: start implementing rook	2019-04-01 18:40:50 +02:00
q3k	14cbacb81a	cluster/kube/metallb: parametrize address pools	2019-04-01 18:00:44 +02:00
q3k	a9c7e86687	cluster: fix metallb, add nginx ingress controller	2019-04-01 17:56:28 +02:00
q3k	1e565dc4a5	cluster: start implementing metallb	2019-01-18 09:40:59 +01:00
q3k	e3af1eb852	cluster: autodetect IP address This is so that Calico starts with the proper subnet. Feeding it just an IP from the node status will mean it parses it as /32 and uses IPIP tunnels for all connectivity.	2019-01-18 09:39:57 +01:00
q3k	5c75574464	cluster/coredns: allow resolving via <svc>.<namespace>.svc.k0.hswaw.net	2019-01-17 21:35:10 +01:00
q3k	af3be426ad	cluster: deploy calico and metrics service	2019-01-17 18:57:19 +01:00
q3k	49b9a13d28	cluster: deploy coredns	2019-01-14 00:02:59 +01:00
q3k	5bebbebe3e	cluster/kube: fix typo	2019-01-13 22:08:05 +01:00
q3k	4d9e72cb8c	cluster/kube: init	2019-01-13 22:06:33 +01:00

1 2 3

147 Commits (6f0d852568b02020e5528e109284041ce18d8eb0)