hscloud

cheshire

hscloud

Author	SHA1	Message	Date
radex	b8d4a8a902	ldapweb: migrate from mirko to standalone Change-Id: I169598232b39b99bfd2d4ff3799b44083ba77e84 Reviewed-on: https://gerrit.hackerspace.pl/c/hscloud/+/1623 Reviewed-by: q3k <q3k@hackerspace.pl>	2023-09-22 21:54:20 +00:00
radex	c2c66bf770	cluster/kube: update admitomatic settings for inventory Change-Id: I62279519f93da338591b1b164878e33027b8f851 Reviewed-on: https://gerrit.hackerspace.pl/c/hscloud/+/1576 Reviewed-by: q3k <q3k@hackerspace.pl>	2023-08-17 12:39:56 +00:00
q3k	03c2d996a0	cluster: fix prodvider deploy (after new CA) Change-Id: Icbdb5e3ac592e9eac3a033ba50af401b706c3e78 Reviewed-on: https://gerrit.hackerspace.pl/c/hscloud/+/1541 Reviewed-by: q3k <q3k@hackerspace.pl>	2023-07-24 14:15:46 +00:00
informatic	10384cd394	cluster/registry: fix common namespaces Public pull ACL in the middle had priority over our more specific rules - moving these to the top fixes common registry namespace ACLs. Change-Id: Ia6f05cef09c0db4eb71155d2c0e2d9944b81f903 Reviewed-on: https://gerrit.hackerspace.pl/c/hscloud/+/1522 Reviewed-by: q3k <q3k@hackerspace.pl>	2023-06-19 23:15:37 +00:00
q3k	c1f372561a	cluster/admitomatic: implement opt-out namespaces Change-Id: I32d4b019211fa755e2b3b103b88ea3f4c14e500f Reviewed-on: https://gerrit.hackerspace.pl/c/hscloud/+/1521 Reviewed-by: informatic <informatic@hackerspace.pl>	2023-06-19 22:54:33 +00:00
informatic	7e841065b0	*: post-certmanager manifests update Change-Id: I745c850268c31777c5722a9833c8152a55615aed Reviewed-on: https://gerrit.hackerspace.pl/c/hscloud/+/1512 Reviewed-by: q3k <q3k@hackerspace.pl>	2023-06-19 21:20:44 +00:00
q3k	3dd3ff5dcd	cluster/cert-manager: update to v1.5.0 Change-Id: I7a4cdadc9956141292302bc004d09d6e9e22855e Reviewed-on: https://gerrit.hackerspace.pl/c/hscloud/+/1497 Reviewed-by: informatic <informatic@hackerspace.pl>	2023-05-26 10:38:16 +00:00
q3k	57df027f28	cluster/kube: add k0-cert-manager.jsonnet view Change-Id: I4d008839f6d6190d0d88fd3fff44974c4f2db2c0 Reviewed-on: https://gerrit.hackerspace.pl/c/hscloud/+/1499 Reviewed-by: implr <implr@hackerspace.pl>	2023-04-01 14:58:50 +00:00
q3k	989dfa3183	cluster/kube: add k0-prodvider.jsonnet view Change-Id: I170fbef3008f906c26ed79387858c3c1e4e2e10c Reviewed-on: https://gerrit.hackerspace.pl/c/hscloud/+/1496 Reviewed-by: implr <implr@hackerspace.pl>	2023-04-01 13:54:49 +00:00
q3k	7572f0790c	k0: add disks Already deployed, now rebalancing. Change-Id: I536a063bc346effd07a1700aeffe598cc35f6f7a Reviewed-on: https://gerrit.hackerspace.pl/c/hscloud/+/1493 Reviewed-by: q3k <q3k@hackerspace.pl>	2023-04-01 11:21:54 +00:00
q3k	073d850a95	cluster/prodvider: redeploy Change-Id: I7a6cce06bb7c2f495d5354d3a2bebef64e307e42 Reviewed-on: https://gerrit.hackerspace.pl/c/hscloud/+/1491 Reviewed-by: q3k <q3k@hackerspace.pl>	2023-04-01 11:18:25 +00:00
q3k	3a6d67e0c4	cluster/prodvider: rewrite against x509 lib for ed25519 support This gets rid of cfssl for the kubernetes bits of prodvider, instead using plain crypto/x509. This also allows to support our new fancy ED25519 CA. Change-Id: If677b3f4523014f56ea802b87499d1c0eb6d92e9 Reviewed-on: https://gerrit.hackerspace.pl/c/hscloud/+/1489 Reviewed-by: q3k <q3k@hackerspace.pl>	2023-03-31 22:53:59 +00:00
q3k	712a5dc3e3	cluster: add bc01n05.hswaw.net This will be our postgres pet machine. Change-Id: Ifff6648394ca6407fb5b5daa853f4abc42541703 Reviewed-on: https://gerrit.hackerspace.pl/c/hscloud/+/1467 Reviewed-by: q3k <q3k@hackerspace.pl>	2023-03-04 22:26:46 +00:00
implr	0156ab24ca	cluster/kube/k0: remove implr-spark bucket, add implr bucket the spark one has been an abandoned experiment from years ago, and I could use a personal one right now Change-Id: I78a706c3371d441b2f8460fd796d0cfd9a198cc6 Reviewed-on: https://gerrit.hackerspace.pl/c/hscloud/+/1464 Reviewed-by: q3k <q3k@hackerspace.pl>	2023-02-26 16:41:23 +00:00
implr	0173f501d7	cockroach: v20.2 -> v21.1 Following https://www.cockroachlabs.com/docs/v21.1/upgrade-cockroach-version?filters=linux --logtostderr is deprecated/removed, but AFAICT from the default config it will still log there: https://www.cockroachlabs.com/docs/v21.1/configure-logs#default-logging-configuration Change-Id: I7fb3f835693f955b37de24dc581140ea34b11630 Reviewed-on: https://gerrit.hackerspace.pl/c/hscloud/+/1461 Reviewed-by: q3k <q3k@hackerspace.pl>	2023-01-30 21:16:42 +00:00
informatic	3b2a2a2ce1	cluster/k0: add paperless to admitomatic config Change-Id: I54df444cddca8a05febfb96af07b9e2f614639fc Reviewed-on: https://gerrit.hackerspace.pl/c/hscloud/+/1453 Reviewed-by: q3k <q3k@hackerspace.pl>	2023-01-05 09:12:18 +00:00
q3k	d171263d6e	k0: remove waw-hdd-yolo-3 This was never used and only caused scary warnings during OSDs reboots due to lack of availability. Change-Id: I14eacd88855bc56e06f2a61cc2d914d985330852 Reviewed-on: https://gerrit.hackerspace.pl/c/hscloud/+/1423 Reviewed-by: implr <implr@hackerspace.pl>	2022-11-20 12:28:20 +00:00
implr	4d98cf5ca8	calico: move from etcd to crd Leaving the CRD definitions as YAML, extracted without modifications from the original install file - this should make upgrades simpler. Change-Id: I7211d2711e2af014b36dd887a951abb9e1032eb9 Reviewed-on: https://gerrit.hackerspace.pl/c/hscloud/+/1179 Reviewed-by: q3k <q3k@hackerspace.pl>	2022-11-19 21:40:34 +00:00
q3k	16842119d1	app/mastodon: deploy Change-Id: I88c104d1a8d5627355b01a8c48dc235635fca5ed Reviewed-on: https://gerrit.hackerspace.pl/c/hscloud/+/1421 Reviewed-by: implr <implr@hackerspace.pl>	2022-11-18 12:15:22 +00:00
q3k	437b0c335f	rook: fix benji This unforks benji back into upstream. The old fork didn't support a new authentication method on Ceph, and we don't have multiple clusters anymore (so we don't need the functionality of the fork). Change-Id: Ie79313b2321ca2e22ad2874b75a71385af95105f Reviewed-on: https://gerrit.hackerspace.pl/c/hscloud/+/1321 Reviewed-by: informatic <informatic@hackerspace.pl>	2022-06-19 11:49:12 +00:00
q3k	b0e3693c0e	cluster/kube: calico: fix etcd endpoints Change-Id: Ia93d355ca343fa5a42ec37fbcae9135cb5304f6e Reviewed-on: https://gerrit.hackerspace.pl/c/hscloud/+/1285 Reviewed-by: implr <implr@hackerspace.pl>	2022-06-11 19:00:52 +00:00
implr	54a34b24a1	cluster/k0: ceph: add tape staging Change-Id: I7fdba86b15f92157888850d2905440b45fb36f17 Reviewed-on: https://gerrit.hackerspace.pl/c/hscloud/+/1263 Reviewed-by: q3k <q3k@hackerspace.pl>	2022-03-05 22:45:29 +00:00
patryk	d0a0b18e54	cluster: allow namespace admins to access certificate resources Change-Id: I532dadfe1799da43d12598e388141f8f9a3872de Reviewed-on: https://gerrit.hackerspace.pl/c/hscloud/+/1250 Reviewed-by: q3k <q3k@hackerspace.pl>	2022-02-05 15:08:47 +00:00
q3k	bdd403c587	cluster: k0: move cockroachdb away from bc01n01, fixup joins Reminded by a power failure on bc01n0{1,2}, we migrate away from at least one of them into another server. We also fix up the startup join parameter to not include the node itself (which is not necessary, but a nice thing to have nonetheless). Since bc01n01 was the initial node of the cluster, we also disable the init job for k0 (which we don't care about anyway). Change-Id: I3406471c0f9542e9d802d39138e400b5a5e74794 Reviewed-on: https://gerrit.hackerspace.pl/c/hscloud/+/1176 Reviewed-by: q3k <q3k@hackerspace.pl>	2021-12-13 22:30:46 +00:00
implr	eca1e080d7	calico: restore CNI_NET_DIR Change-Id: I04e17f8639505f5b7cc42e86392abc175b7922db Reviewed-on: https://gerrit.hackerspace.pl/c/hscloud/+/1178 Reviewed-by: q3k <q3k@hackerspace.pl>	2021-12-03 03:10:13 +00:00
implr	12f176c1eb	calico 3.14 -> 1.15 Change-Id: I9eceaf26017e483235b97c8d08717d2750fabe25 Reviewed-on: https://gerrit.hackerspace.pl/c/hscloud/+/995 Reviewed-by: q3k <q3k@hackerspace.pl>	2021-11-20 22:12:52 +00:00
informatic	e839f95079	cluster/kube/k0: add matrix and informatic personal ceph users Change-Id: Ied8d474709b8053e9fc339435d3ca1ca5fdfa710	2021-09-14 22:21:22 +02:00
q3k	4b8ee32246	cluster/kube: always enable flexdriver Documentation says [1] this is disabled by default in 1.1, but that documentation kinda lies [2]. [1] - `235d5a384b/Documentation/flexvolume.md (ceph-flexvolume-configuration)` [2] - `64e28af741 (diff-d1eb5cba50e3770b61ccd3c730cd40514053e1da0233dfe09b5e7967e76a2a6cL424-L425)` Change-Id: Ia92c99e137ed751db62c0f56d42c4901986d0bb8	2021-09-14 21:39:39 +02:00
q3k	38f72fe094	cluster: k0: move ceph-waw3 to proper realm/zonegroup With this we can use Ceph's multi-site support to easily migrate to our new k0 Ceph cluster. This migration was done by using radosgw-admin to rename the existing realm/zonegroup to the new names (hscloud and eu), and then reworking the jsonnet so that the Rook operator would effectively do nothing. It sounds weird that creating a bunch of CRs like Object{Realm,ZoneGroup,Zone} realm would be a no-op for the operator, but that's how Rook works - a CephObjectStore generally creates everything that the above CRs would create too, but implicitly. Adding the extra CRs just allows specifying extra settings, like names. (it wasn't fully a no-op, as the rgw daemon is parametrized by realm/zonegroup/zone names, so that had to be restarted) We also make the radosgw serve under object.ceph-eu.hswaw.net, which allows us to right away start using a zonegroup URL instead of the zone-only URL. Change-Id: I4dca55a705edb3bd28e54f50982c85720a17b877	2021-09-14 21:39:39 +02:00
q3k	085a8ff247	cluster: k0: upgrade to ceph 16.2.5 This was fun. See b/6 for a log of how swimmingly this went. Change-Id: I96c3c18b5d33ef86523b3506f49a390419e9ca7f	2021-09-14 21:39:39 +02:00
q3k	464fb04f39	cluster: k0: bump rook to 1.6 This is needed to get Rook to talk to an external Ceph 16/Pacific cluster. This is mostly a bunch of CRD/RBAC changes. Most notably, we yeet our own CRD rewrite and just slurp in upstream CRD defs. Change-Id: I08e7042585722ae4440f97019a5212d6cf733fcc	2021-09-14 21:39:37 +02:00
q3k	6579e842b0	kartongips: paper over^W^Wfix CRD updates Ceph CRD updates would fail with: ERROR Error updating customresourcedefinitions cephclusters.ceph.rook.io: expected kind, but got map This wasn't just https://github.com/bitnami/kubecfg/issues/259 . We pull in the 'solution' from Pulumi (https://github.com/pulumi/pulumi-kubernetes/pull/622) which just retries the update via a JSON update instead, and that seems to have worked. We also add some better error return wrapping, which I used to debug this issue properly. Oof. Change-Id: I2007a7857e44128d74760174b61b59efa58e9cbc	2021-09-11 20:54:34 +00:00
q3k	4f0468fa26	cluster/kube: remove ceph diff against k0 production This now has a zero diff against prod. location fields in CephCluster.storage.nodes seem to have been removed from the CRD at some point. Not sure how the CRUSH tree now gets populated, but whatever, it's been working like this for a while already. Same for CephObjectStore.gateway.type. The Rook Operator has been zero-scaled for a while now due to b/6. Change-Id: I30a836f273f4c1529f60fa9297c96b7aac412f59	2021-09-11 12:43:53 +00:00
q3k	89a16f4de4	cluster/admitomatic: allow use-regex n-i-c annotation This annotation is used to permit routes defined by regexes instead of simple prefix matching. This is used by our synapse deployment for routing incomming HTTP requests to diffferent Synapse components. I've stumbled upon this while deploying a new Matrix/Synapse instance. This hasn't been yet a problem because the existing ingresses for Matrix deployments predate admitomatic. Change-Id: I821e58b214450ccf0de22d2585c3b0d11fbe71c0	2021-06-06 12:58:11 +00:00
q3k	7251f2720e	Merge changes Ib068109f,I9a00487f,I1861fe7c,I254983e5,I3e2bedca, ... * changes: cluster/identd/ident: update README cluster/kube: deploy identd cluster/identd: implement cluster/identd/kubenat: implement cluster/identd/cri: import cluster/identd/ident: add TestE2E cluster/identd/ident: add Query function cluster/identd/ident: add IdentError cluster/identd/ident: add basic ident protocol server cluster/identd/ident: add basic ident protocol client	2021-05-28 23:08:10 +00:00
q3k	2414afe3c0	cluster/kube: deploy identd Change-Id: I9a00487fc4a972ecb0904055dbaaab08221062c1	2021-05-26 19:46:09 +00:00
q3k	e17f7edde0	cluster/kube: nginx: add Hscloud-Nic-Source-* headers These can be used by production jobs to get the source port of the client connecting over HTTP. A followup CR implements just that. Change-Id: Ic8e29eaf806bb196d8cfcfb604ff66ae4d0d166a	2021-05-22 19:16:39 +00:00
q3k	ba2f4d8215	cluster/prodvider: deploy Change-Id: I01d931a664e4b09c0d75fb01fb3f2528bc0f1a53	2021-05-19 22:13:26 +00:00
q3k	5ae5cbec81	Merge "cluster/kube: bump nginx-ingress-controller, backport openssl 1.1.1k"	2021-05-19 15:34:45 +00:00
q3k	99b91b11f1	cluster/k0/admitomatic: add .hswaw.net to hswaw-prod namespace This was preventing certificate refresh in the hswaw-prod mirko ingress. Change-Id: I14b18b642a3948a9864e2d9a90b2a2b2c145b9b1	2021-03-28 17:34:34 +00:00
q3k	2e8d24b84a	cluster/kube: bump nginx-ingress-controller, backport openssl 1.1.1k This fixes CVE-2021-3450 and CVE-2021-3449. Deployed on prod: $ kubectl -n nginx-system exec nginx-ingress-controller-5c69c5cb59-2f8v4 -- openssl version OpenSSL 1.1.1k 25 Mar 2021 Change-Id: I7115fd2367cca7b687c555deb2134b22d19a291a	2021-03-25 18:16:13 +00:00
q3k	bf266c6aaf	cluster/k0: add dns crdb user In preparation for running PowerDNS on k0. Change-Id: I853c7465a6a32d02628fa6cfdeb445eb9937b3be	2021-03-17 21:49:00 +00:00
q3k	3b8935378a	cluster/crdb: make init job 'idempotent' This enables its redeployment with a newer crdb image. Change-Id: If039992674f401af53738c80d22cc2ca2818fe00	2021-03-17 21:48:30 +00:00
q3k	64de7afe32	cluster/kube/k0: fix syntax errors This happened in `793ca1b3` and slipped past review. Change-Id: Ie31f0e1ec03d6e4545d6683b21f528550bf4ef9f	2021-03-17 21:47:51 +00:00
q3k	793ca1b3b2	cluster/kube: limit OSDs in ceph-waw3 to 8GB RAM Each OSD is connected to a 6TB drive, and with the good ol' 1TB storage -> 1GB RAM rule of thumb for OSDs, we end up with 6GB. Or, to round up, 8GB. I'm doing this because over the past few weeks OSDs in ceph-waw3 have been using a _ton_ of RAM. This will probably not prevent that (and instead they wil OOM more often :/), but it at will prevent us from wasting resources (k0 started migrating pods to other nodes, and running full nodes like that without an underlying request makes for a terrible draining experience). We need to get to the bottom of why this is happening in the first place, though. Did this happen as we moved to containerd? Followup: b.hswaw.net/29 Already deployed to production. Change-Id: I98df63763c35017eb77595db7b9f2cce71756ed1	2021-03-07 00:09:58 +00:00
q3k	78d6f11cb2	Merge "cluster/admitomatic: allow whitelist-source-range"	2021-02-08 17:21:59 +00:00
q3k	877cf0af26	🅱️ Fixes b/8 Change-Id: I5a5779c3688451d89c0601dc913143d75048c9f6	2021-02-08 15:10:11 +00:00
q3k	943ab5b1a6	cluster/admitomatic: allow whitelist-source-range Without this, cert-manager get stuck. Deployed to prod. Change-Id: I356cd44f455b6f4aecea9ae396f6a05e1a727859	2021-02-07 23:35:28 +00:00
q3k	f40c9249ce	cluster/kube: allow system:admin-namespaces to modify ingresses This will permit any binding to system:admin-namespaces (eg. personal-* namespaces, per-namespace extra admin access like matrix-0x3c) the ability to create and updates ingresses. Change-Id: I522896ebe290fe982d6fe46b7b1d604d22b4f72c	2021-02-07 19:24:43 +00:00
q3k	41bbf1436a	cluster/kube: deploy admitomatic webhook This has been (succesfully) tested on prod and then rolled back. Change-Id: I22657f66b4aeaa8a0ae452035ba18a79f4549b14	2021-02-07 19:19:23 +00:00

1 2 3 4

156 Commits (7f5f2099c5d3e9762345e27bad1c2d69ca6220ff)