hscloud

hswaw

hscloud

mirror of https://gerrit.hackerspace.pl/hscloud

Author	SHA1	Message	Date
informatic	10384cd394	cluster/registry: fix common namespaces Public pull ACL in the middle had priority over our more specific rules - moving these to the top fixes common registry namespace ACLs. Change-Id: Ia6f05cef09c0db4eb71155d2c0e2d9944b81f903 Reviewed-on: https://gerrit.hackerspace.pl/c/hscloud/+/1522 Reviewed-by: q3k <q3k@hackerspace.pl>	2023-06-19 23:15:37 +00:00
q3k	c1f372561a	cluster/admitomatic: implement opt-out namespaces Change-Id: I32d4b019211fa755e2b3b103b88ea3f4c14e500f Reviewed-on: https://gerrit.hackerspace.pl/c/hscloud/+/1521 Reviewed-by: informatic <informatic@hackerspace.pl>	2023-06-19 22:54:33 +00:00
q3k	9f0e1e88f1	cluster/clustercfg: rewrite it in Go This replaces the old clustercfg script with a brand spanking new mostly-equivalent Go reimplementation. But it's not exactly the same, here are the differences: 1. No cluster deployment logic anymore - we expect everyone to use ops/ machine at this point. 2. All certs/keys are Ed25519 and do not expire by default - but support for short-lived certificates is there, and is actually more generic and reusable. Currently it's only used for admincreds. 3. Speaking of admincreds: the new admincreds automatically figure out your username. 4. admincreds also doesn't shell out to kubectl anymore, and doesn't override your default context. The generated creds can live peacefully alongside your normal prodaccess creds. 5. gencerts (the new nodestrap without deployment support) now automatically generates certs for all nodes, based on local Nix modules in ops/. 6. No secretstore support. This will be changed once we rebuild secretstore in Go. For now users are expected to manually run secretstore sync on cluster/secrets. Change-Id: Ida935f44e04fd933df125905eee10121ac078495 Reviewed-on: https://gerrit.hackerspace.pl/c/hscloud/+/1498 Reviewed-by: q3k <q3k@hackerspace.pl>	2023-06-19 22:23:52 +00:00
informatic	7e841065b0	*: post-certmanager manifests update Change-Id: I745c850268c31777c5722a9833c8152a55615aed Reviewed-on: https://gerrit.hackerspace.pl/c/hscloud/+/1512 Reviewed-by: q3k <q3k@hackerspace.pl>	2023-06-19 21:20:44 +00:00
q3k	3dd3ff5dcd	cluster/cert-manager: update to v1.5.0 Change-Id: I7a4cdadc9956141292302bc004d09d6e9e22855e Reviewed-on: https://gerrit.hackerspace.pl/c/hscloud/+/1497 Reviewed-by: informatic <informatic@hackerspace.pl>	2023-05-26 10:38:16 +00:00
q3k	ffdb97b7dd	cluster/prodaccess: fix cert migration bug Change-Id: I7426e60731b09c571aa7385f5213e998f04675a6 Reviewed-on: https://gerrit.hackerspace.pl/c/hscloud/+/1510 Reviewed-by: ironbound <ironbound@hackerspace.pl>	2023-04-14 08:13:39 +00:00
q3k	57df027f28	cluster/kube: add k0-cert-manager.jsonnet view Change-Id: I4d008839f6d6190d0d88fd3fff44974c4f2db2c0 Reviewed-on: https://gerrit.hackerspace.pl/c/hscloud/+/1499 Reviewed-by: implr <implr@hackerspace.pl>	2023-04-01 14:58:50 +00:00
q3k	9251121fa9	cluster/certs: remove old kube CA This completes the migration away from the old CA/cert infrastructure. The tool which was used to generate all these certs will come next. It's effectively a reimplementation of clustercfg in Go. We also removed the unused kube-serviceaccounts cert, which was generated by the old tooling for no good reason (we only need a key for service accounts, not an actual cert...). Change-Id: Ied9e5d8fc90c64a6b4b9fdd20c33981410c884b4 Reviewed-on: https://gerrit.hackerspace.pl/c/hscloud/+/1501 Reviewed-by: q3k <q3k@hackerspace.pl>	2023-04-01 13:55:18 +00:00
q3k	bdf2fa326f	cluster/certs: finish replacing all CAs This finishes the regeneration of all cluster CAs/certs to be never expiring ED25519 certs. We still have leftovers of the old Kube CA (and it's still being accepted in Kubernetes components). Cleaning that up is the next step. Change-Id: I883f94fd8cef3e3b5feefdf56ee106e462bb04a9 Reviewed-on: https://gerrit.hackerspace.pl/c/hscloud/+/1500 Reviewed-by: q3k <q3k@hackerspace.pl>	2023-04-01 13:55:14 +00:00
q3k	989dfa3183	cluster/kube: add k0-prodvider.jsonnet view Change-Id: I170fbef3008f906c26ed79387858c3c1e4e2e10c Reviewed-on: https://gerrit.hackerspace.pl/c/hscloud/+/1496 Reviewed-by: implr <implr@hackerspace.pl>	2023-04-01 13:54:49 +00:00
q3k	7572f0790c	k0: add disks Already deployed, now rebalancing. Change-Id: I536a063bc346effd07a1700aeffe598cc35f6f7a Reviewed-on: https://gerrit.hackerspace.pl/c/hscloud/+/1493 Reviewed-by: q3k <q3k@hackerspace.pl>	2023-04-01 11:21:54 +00:00
q3k	073d850a95	cluster/prodvider: redeploy Change-Id: I7a6cce06bb7c2f495d5354d3a2bebef64e307e42 Reviewed-on: https://gerrit.hackerspace.pl/c/hscloud/+/1491 Reviewed-by: q3k <q3k@hackerspace.pl>	2023-04-01 11:18:25 +00:00
q3k	bbc5a43d77	cluster: move kubernetes services to temporary CA bundle This is already deployed, and it allows Kubernetes components (temporary) freedom to use the old or new CA cert. Change-Id: I8ac7f773a333c30fa22902b8edc327c0c700a482 Reviewed-on: https://gerrit.hackerspace.pl/c/hscloud/+/1490 Reviewed-by: q3k <q3k@hackerspace.pl>	2023-03-31 22:53:59 +00:00
q3k	3a6d67e0c4	cluster/prodvider: rewrite against x509 lib for ed25519 support This gets rid of cfssl for the kubernetes bits of prodvider, instead using plain crypto/x509. This also allows to support our new fancy ED25519 CA. Change-Id: If677b3f4523014f56ea802b87499d1c0eb6d92e9 Reviewed-on: https://gerrit.hackerspace.pl/c/hscloud/+/1489 Reviewed-by: q3k <q3k@hackerspace.pl>	2023-03-31 22:53:59 +00:00
q3k	777aab92a9	cluster/prodaccess: use new kube CA cert Change-Id: I1bff03008a4a212ad93e5eaa112adaa2b0cad3e7 Reviewed-on: https://gerrit.hackerspace.pl/c/hscloud/+/1488 Reviewed-by: q3k <q3k@hackerspace.pl>	2023-03-31 22:53:59 +00:00
q3k	a4f8a459b9	cluster: partial cert bump Done: 1. etcd peer CA & certs 2. etcd client CA & certs 3. kube CA (currently all components set to accept both new and old CA, new CA called ca-kube-new) 4. kube apiserver 5. kubelet & kube-proxy 6. prodvider intermediate TODO: 1. kubernetes controller-manager & kubernetes scheduler 2. kubefront CA 3. admitomatic? 4. undo bundle on kube CA components to fully transition away from old CA Change-Id: If529eeaed9a6a2063bed23c9d81c57b36b9a0115 Reviewed-on: https://gerrit.hackerspace.pl/c/hscloud/+/1487 Reviewed-by: q3k <q3k@hackerspace.pl>	2023-03-31 22:53:59 +00:00
implr	779727b39e	machines/bc01n05: postgres: auth, hba, more ram Change-Id: Id10b97efa3588a2a9147a349391da559e6cce7e5 Reviewed-on: https://gerrit.hackerspace.pl/c/hscloud/+/1482 Reviewed-by: q3k <q3k@hackerspace.pl>	2023-03-28 21:22:50 +00:00
implr	3b0887397a	machines/bc01n05: postgres tuning Change-Id: I30925a84216b45bde9e92b67b007f15b2cdf58e8 Reviewed-on: https://gerrit.hackerspace.pl/c/hscloud/+/1481 Reviewed-by: q3k <q3k@hackerspace.pl>	2023-03-26 12:16:20 +00:00
implr	821b839b16	machines/bc01n05: zfsify; initial postgres Change-Id: I355ac4aa3c56a1e6a564b7a3c7cfc4e67b072dae Reviewed-on: https://gerrit.hackerspace.pl/c/hscloud/+/1470 Reviewed-by: q3k <q3k@hackerspace.pl>	2023-03-11 21:33:14 +00:00
implr	3320155d23	cluster/machines/base: enable microcode loading This will happen at next boot via early microcode - no risk to currently running processes. Change-Id: I88553fa9a1350ebb80aaf978e29e8f1156783a2c Reviewed-on: https://gerrit.hackerspace.pl/c/hscloud/+/1469 Reviewed-by: q3k <q3k@hackerspace.pl>	2023-03-11 21:33:05 +00:00
q3k	712a5dc3e3	cluster: add bc01n05.hswaw.net This will be our postgres pet machine. Change-Id: Ifff6648394ca6407fb5b5daa853f4abc42541703 Reviewed-on: https://gerrit.hackerspace.pl/c/hscloud/+/1467 Reviewed-by: q3k <q3k@hackerspace.pl>	2023-03-04 22:26:46 +00:00
q3k	3a9562ecfd	cluster: k0: remove native ceph After installing HBJ11s and spreading out the mons we're going full Rook. Change-Id: Ia00cbe953548f06cf27343371fc67890619c8262 Reviewed-on: https://gerrit.hackerspace.pl/c/hscloud/+/1466 Reviewed-by: q3k <q3k@hackerspace.pl>	2023-03-04 22:26:39 +00:00
q3k	ef3aab6a14	k0: host os bump wip This bumps it on bc01n01, but nowhere else yet. We have to vendor some more kubelet bits unfortunately. Change-Id: Ifb169dd9c2c19d60f88d946d065d4446141601b1 Reviewed-on: https://gerrit.hackerspace.pl/c/hscloud/+/1465 Reviewed-by: implr <implr@hackerspace.pl>	2023-03-04 22:26:14 +00:00
implr	0156ab24ca	cluster/kube/k0: remove implr-spark bucket, add implr bucket the spark one has been an abandoned experiment from years ago, and I could use a personal one right now Change-Id: I78a706c3371d441b2f8460fd796d0cfd9a198cc6 Reviewed-on: https://gerrit.hackerspace.pl/c/hscloud/+/1464 Reviewed-by: q3k <q3k@hackerspace.pl>	2023-02-26 16:41:23 +00:00
implr	0173f501d7	cockroach: v20.2 -> v21.1 Following https://www.cockroachlabs.com/docs/v21.1/upgrade-cockroach-version?filters=linux --logtostderr is deprecated/removed, but AFAICT from the default config it will still log there: https://www.cockroachlabs.com/docs/v21.1/configure-logs#default-logging-configuration Change-Id: I7fb3f835693f955b37de24dc581140ea34b11630 Reviewed-on: https://gerrit.hackerspace.pl/c/hscloud/+/1461 Reviewed-by: q3k <q3k@hackerspace.pl>	2023-01-30 21:16:42 +00:00
informatic	3b2a2a2ce1	cluster/k0: add paperless to admitomatic config Change-Id: I54df444cddca8a05febfb96af07b9e2f614639fc Reviewed-on: https://gerrit.hackerspace.pl/c/hscloud/+/1453 Reviewed-by: q3k <q3k@hackerspace.pl>	2023-01-05 09:12:18 +00:00
patryk	a2bcfeaf0b	cluster: bump vm.max_map_count sysctl tunable to a higher value This is needed for running some memory-intensive workloads, like ElasticSearch/OpenSearch. Change-Id: I7b00ec5faca73ec69bdbf1ca41c025d7efeae55c Reviewed-on: https://gerrit.hackerspace.pl/c/hscloud/+/1443 Reviewed-by: implr <implr@hackerspace.pl>	2022-12-11 20:28:51 +00:00
q3k	d171263d6e	k0: remove waw-hdd-yolo-3 This was never used and only caused scary warnings during OSDs reboots due to lack of availability. Change-Id: I14eacd88855bc56e06f2a61cc2d914d985330852 Reviewed-on: https://gerrit.hackerspace.pl/c/hscloud/+/1423 Reviewed-by: implr <implr@hackerspace.pl>	2022-11-20 12:28:20 +00:00
implr	4d98cf5ca8	calico: move from etcd to crd Leaving the CRD definitions as YAML, extracted without modifications from the original install file - this should make upgrades simpler. Change-Id: I7211d2711e2af014b36dd887a951abb9e1032eb9 Reviewed-on: https://gerrit.hackerspace.pl/c/hscloud/+/1179 Reviewed-by: q3k <q3k@hackerspace.pl>	2022-11-19 21:40:34 +00:00
q3k	16842119d1	app/mastodon: deploy Change-Id: I88c104d1a8d5627355b01a8c48dc235635fca5ed Reviewed-on: https://gerrit.hackerspace.pl/c/hscloud/+/1421 Reviewed-by: implr <implr@hackerspace.pl>	2022-11-18 12:15:22 +00:00
q3k	ee41e94e0a	k0: bump certs Change-Id: I9d7a48d64de5d1aa82a134a8c22bfc50ba8ad270 Reviewed-on: https://gerrit.hackerspace.pl/c/hscloud/+/1402 Reviewed-by: informatic <informatic@hackerspace.pl>	2022-10-09 20:22:43 +00:00
q3k	3c31f32307	cluster: bump prodvider certs Change-Id: Ieefe3c733dd40a94c13a5e1c1648dd43d27c180a Reviewed-on: https://gerrit.hackerspace.pl/c/hscloud/+/1386 Reviewed-by: implr <implr@hackerspace.pl>	2022-09-10 15:46:39 +00:00
implr	e69e98da47	third_party/py: update rules_python, use pip-compile for requirements Change-Id: If8309e8e3a4b58142f7479005a9eb4cbb1043cdb Reviewed-on: https://gerrit.hackerspace.pl/c/hscloud/+/1324 Reviewed-by: q3k <q3k@hackerspace.pl>	2022-07-05 21:27:31 +00:00
q3k	437b0c335f	rook: fix benji This unforks benji back into upstream. The old fork didn't support a new authentication method on Ceph, and we don't have multiple clusters anymore (so we don't need the functionality of the fork). Change-Id: Ie79313b2321ca2e22ad2874b75a71385af95105f Reviewed-on: https://gerrit.hackerspace.pl/c/hscloud/+/1321 Reviewed-by: informatic <informatic@hackerspace.pl>	2022-06-19 11:49:12 +00:00
q3k	55a486ae49	cluster: refactor nix machinery to fit //ops This is a chonky refactor that get rids of the previous cluster-centric defs-* plain nix file setup. Now, nodes are configured individually in plain nixos modules, and are provided a view of all other nodes in the 'machines' attribute. Cluster logic is moved into modules which inspect this array to find other nodes within the same cluster. Kubernetes options are not fully clusterified yet (ie., they are still hardcode to only provide the 'k0' cluster) but that can be fixed later. The Ceph machinery is a good example of how that can be done. The new NixOS configs are zero-diff against prod. While this is done mostly by keeping the logic, we had to keep a few newly discovered 'bugs' around by adding some temporary options which keeps things as they are. These will be removed in a future CL, then introducing a diff (but no functional changes, hopefully). We also remove the nix eval from clustercfg as it was not used anymore (basically since we refactored certs at some point). Change-Id: Id79772a96249b0e6344046f96f9c2cb481c4e1f4 Reviewed-on: https://gerrit.hackerspace.pl/c/hscloud/+/1322 Reviewed-by: informatic <informatic@hackerspace.pl>	2022-06-19 11:48:52 +00:00
q3k	b0e3693c0e	cluster/kube: calico: fix etcd endpoints Change-Id: Ia93d355ca343fa5a42ec37fbcae9135cb5304f6e Reviewed-on: https://gerrit.hackerspace.pl/c/hscloud/+/1285 Reviewed-by: implr <implr@hackerspace.pl>	2022-06-11 19:00:52 +00:00
implr	0544d27c04	tools, cluster/tools: bazel5 compat: remove unused import Change-Id: I8b264a6c36e4d0f1535f38ad1f41495e62061f26 Reviewed-on: https://gerrit.hackerspace.pl/c/hscloud/+/1308 Reviewed-by: daz <daz@hackerspace.pl>	2022-06-04 19:56:40 +00:00
q3k	d584e76ea3	cluster/clustercfg: fix for nix 2.4 Change-Id: I3f9ebd895495a23ec179ccd237389e8f3e531768 Reviewed-on: https://gerrit.hackerspace.pl/c/hscloud/+/1284 Reviewed-by: q3k <q3k@hackerspace.pl>	2022-04-04 17:51:44 +00:00
q3k	42c17872fd	cluster/certs: bump certs Change-Id: I549364c050a96f72859886e6b724e07924ee3964 Reviewed-on: https://gerrit.hackerspace.pl/c/hscloud/+/1282 Reviewed-by: q3k <q3k@hackerspace.pl>	2022-04-04 17:51:44 +00:00
implr	54a34b24a1	cluster/k0: ceph: add tape staging Change-Id: I7fdba86b15f92157888850d2905440b45fb36f17 Reviewed-on: https://gerrit.hackerspace.pl/c/hscloud/+/1263 Reviewed-by: q3k <q3k@hackerspace.pl>	2022-03-05 22:45:29 +00:00
patryk	d0a0b18e54	cluster: allow namespace admins to access certificate resources Change-Id: I532dadfe1799da43d12598e388141f8f9a3872de Reviewed-on: https://gerrit.hackerspace.pl/c/hscloud/+/1250 Reviewed-by: q3k <q3k@hackerspace.pl>	2022-02-05 15:08:47 +00:00
q3k	bdd403c587	cluster: k0: move cockroachdb away from bc01n01, fixup joins Reminded by a power failure on bc01n0{1,2}, we migrate away from at least one of them into another server. We also fix up the startup join parameter to not include the node itself (which is not necessary, but a nice thing to have nonetheless). Since bc01n01 was the initial node of the cluster, we also disable the init job for k0 (which we don't care about anyway). Change-Id: I3406471c0f9542e9d802d39138e400b5a5e74794 Reviewed-on: https://gerrit.hackerspace.pl/c/hscloud/+/1176 Reviewed-by: q3k <q3k@hackerspace.pl>	2021-12-13 22:30:46 +00:00
implr	eca1e080d7	calico: restore CNI_NET_DIR Change-Id: I04e17f8639505f5b7cc42e86392abc175b7922db Reviewed-on: https://gerrit.hackerspace.pl/c/hscloud/+/1178 Reviewed-by: q3k <q3k@hackerspace.pl>	2021-12-03 03:10:13 +00:00
implr	12f176c1eb	calico 3.14 -> 1.15 Change-Id: I9eceaf26017e483235b97c8d08717d2750fabe25 Reviewed-on: https://gerrit.hackerspace.pl/c/hscloud/+/995 Reviewed-by: q3k <q3k@hackerspace.pl>	2021-11-20 22:12:52 +00:00
q3k	0f8e5a2132	*: do not require env.sh This removes the need to source env.{sh,fish} when working with hscloud. This is done by: 1. Implementing a Go library to reliably detect the location of the active hscloud checkout. That in turn is enabled by BUILD_WORKSPACE_DIRECTORY being now a thing in Bazel. 2. Creating a tool `hscloud`, with a command `hscloud workspace` that returns the workspace path. 3. Wrapping this tool to be accessible from Python and Bash. 4. Bumping all users of hscloud_root to use either the Go library or one of the two implemented wrappers. We also drive-by replace tools/install.sh to be a proper sh_binary, and make it yell at people if it isn't being ran as `bazel run //tools:install`. Finally, we also drive-by delete cluster/tools/nixops.sh which was never used. Change-Id: I7873714319bfc38bbb930b05baa605c5aa36470a Reviewed-on: https://gerrit.hackerspace.pl/c/hscloud/+/1169 Reviewed-by: informatic <informatic@hackerspace.pl>	2021-10-17 21:21:58 +00:00
q3k	3b67afe81b	cluster/certs: refresh Change-Id: I2aa8fead4427b917afa4758ea0078125d9c4e914 Reviewed-on: https://gerrit.hackerspace.pl/c/hscloud/+/1153 Reviewed-by: q3k <q3k@hackerspace.pl>	2021-10-07 19:58:35 +00:00
informatic	e839f95079	cluster/kube/k0: add matrix and informatic personal ceph users Change-Id: Ied8d474709b8053e9fc339435d3ca1ca5fdfa710	2021-09-14 22:21:22 +02:00
q3k	4b8ee32246	cluster/kube: always enable flexdriver Documentation says [1] this is disabled by default in 1.1, but that documentation kinda lies [2]. [1] - `235d5a384b/Documentation/flexvolume.md (ceph-flexvolume-configuration)` [2] - `64e28af741 (diff-d1eb5cba50e3770b61ccd3c730cd40514053e1da0233dfe09b5e7967e76a2a6cL424-L425)` Change-Id: Ia92c99e137ed751db62c0f56d42c4901986d0bb8	2021-09-14 21:39:39 +02:00
q3k	38f72fe094	cluster: k0: move ceph-waw3 to proper realm/zonegroup With this we can use Ceph's multi-site support to easily migrate to our new k0 Ceph cluster. This migration was done by using radosgw-admin to rename the existing realm/zonegroup to the new names (hscloud and eu), and then reworking the jsonnet so that the Rook operator would effectively do nothing. It sounds weird that creating a bunch of CRs like Object{Realm,ZoneGroup,Zone} realm would be a no-op for the operator, but that's how Rook works - a CephObjectStore generally creates everything that the above CRs would create too, but implicitly. Adding the extra CRs just allows specifying extra settings, like names. (it wasn't fully a no-op, as the rgw daemon is parametrized by realm/zonegroup/zone names, so that had to be restarted) We also make the radosgw serve under object.ceph-eu.hswaw.net, which allows us to right away start using a zonegroup URL instead of the zone-only URL. Change-Id: I4dca55a705edb3bd28e54f50982c85720a17b877	2021-09-14 21:39:39 +02:00
q3k	18084c1e86	cluster/nix: k0: enable rgw on osds This enables radosgw wherever osds are. This should be fast and works for us because we have little osd hosts. Change-Id: I4ed014d2790d6c02a2ba8e775aaa1846032dee1e	2021-09-14 21:39:39 +02:00

1 2 3 4 5 ...

277 Commits (a27733bbfc66adb502c15c47a90db112b6a5abc2)