Commit Graph

22 Commits (68d82124d93c7bb65df0cf0651aaecca95d1dec6)

Author SHA1 Message Date
implr ac4f99e2e1 cluster/machines/dcr01s24: pivot to lvm root and efi boot
Change-Id: I2df08a0ff7366607781421e6fe8c0ddce86e57a5
Reviewed-on: https://gerrit.hackerspace.pl/c/hscloud/+/1781
Reviewed-by: q3k <q3k@hackerspace.pl>
2023-11-12 19:36:25 +00:00
implr f47d359a28 cluster/machines/dcr01s22: pivot to mirrored efi boot
Change-Id: I673bad18915ee76e0f35c56e689345f360d295dc
Reviewed-on: https://gerrit.hackerspace.pl/c/hscloud/+/1771
Reviewed-by: q3k <q3k@hackerspace.pl>
2023-11-12 19:36:25 +00:00
implr b8ccfa8459 cluster/machines: move common LVM support bits into base.nix
Change-Id: I13e5653241a8245bae67cc7e660312484f1dcaca
Reviewed-on: https://gerrit.hackerspace.pl/c/hscloud/+/1767
Reviewed-by: q3k <q3k@hackerspace.pl>
2023-11-12 01:31:39 +00:00
implr 8edc52e619 c/m/dcr01s22: pivot to lvm root
The bootloader is *not* moved yet, machine still boots off the old disk

Change-Id: I8cc92489bb06bfe9581d68503237e08fa8082c7c
Reviewed-on: https://gerrit.hackerspace.pl/c/hscloud/+/1766
Reviewed-by: q3k <q3k@hackerspace.pl>
2023-11-12 01:30:42 +00:00
implr b37b70cbd4 cluster/m/m/base: chronyd: enable rtc sync, aggresively step
Change-Id: I61827ec2c77e79ce3e394eb2574372d3c21394d8
Reviewed-on: https://gerrit.hackerspace.pl/c/hscloud/+/1765
Reviewed-by: q3k <q3k@hackerspace.pl>
2023-11-12 01:30:42 +00:00
informatic 4d3a0cc123 cluster/kube-common: avoid full nixpkgs checkouts
fetchGit was unnecessarily fetching full nixpkgs repository during
evaluation.

Change-Id: Ia22a234938014659d4c33e16c5028a63884d476c
Reviewed-on: https://gerrit.hackerspace.pl/c/hscloud/+/1728
Reviewed-by: q3k <q3k@hackerspace.pl>
2023-11-06 21:55:24 +00:00
implr c783390cf5 cluster/m/m/base: add a bunch of utilities to systemPackages
Change-Id: I8ad61f925011d019b8ef868013fcb266947a9c94
Reviewed-on: https://gerrit.hackerspace.pl/c/hscloud/+/1755
Reviewed-by: q3k <q3k@hackerspace.pl>
2023-11-01 23:12:07 +00:00
q3k a5ba554446 k0: enable fstrim, lower gc thresh for kubelet
fstrim is nice as it might prevent us from killing SSDs so fast.

A lower GC threshold for kubelet is nice as we run non-kubelet services
on these nodes, and they need their space. Notably, Ceph's mons tend to
be extremely claustrophobic, firing alerts at 70% disk usage or so.

Change-Id: I94c1787e62f82a02f107d04a87575327d3d79c01
Reviewed-on: https://gerrit.hackerspace.pl/c/hscloud/+/1724
Reviewed-by: implr <implr@hackerspace.pl>
2023-10-13 11:47:36 +00:00
q3k 43b6db895d k0: fully disable kube control/data plane on bc01n01,n02
Change-Id: I103f41059d75aa6b3ce318fd6f863f50ad013160
Reviewed-on: https://gerrit.hackerspace.pl/c/hscloud/+/1697
Reviewed-by: implr <implr@hackerspace.pl>
2023-10-09 23:32:26 +00:00
implr bae9499880 cluster/machines: enable controlplane on dcr03s16, disable on bc01n01
Change-Id: I199f66ac60c522c29fe4900702eb9eed48749cfe
Reviewed-on: https://gerrit.hackerspace.pl/c/hscloud/+/1692
Reviewed-by: q3k <q3k@hackerspace.pl>
2023-10-09 19:10:19 +00:00
implr 9a88f28805 cluster/{machines,certs}: add dcr03s16.hswaw.net
Also make dataplane-only nodes actually work:
- make kubeproxy use the same package as kubelet
- disable firewall

Change-Id: I7babbb749656e6f75151c8eda6e3f09f3c6bff5f
Reviewed-on: https://gerrit.hackerspace.pl/c/hscloud/+/1686
Reviewed-by: q3k <q3k@hackerspace.pl>
2023-10-09 19:02:18 +00:00
q3k 9251121fa9 cluster/certs: remove old kube CA
This completes the migration away from the old CA/cert infrastructure.

The tool which was used to generate all these certs will come next. It's
effectively a reimplementation of clustercfg in Go.

We also removed the unused kube-serviceaccounts cert, which was
generated by the old tooling for no good reason (we only need a key for
service accounts, not an actual cert...).

Change-Id: Ied9e5d8fc90c64a6b4b9fdd20c33981410c884b4
Reviewed-on: https://gerrit.hackerspace.pl/c/hscloud/+/1501
Reviewed-by: q3k <q3k@hackerspace.pl>
2023-04-01 13:55:18 +00:00
q3k bbc5a43d77 cluster: move kubernetes services to temporary CA bundle
This is already deployed, and it allows Kubernetes components
(temporary) freedom to use the old or new CA cert.

Change-Id: I8ac7f773a333c30fa22902b8edc327c0c700a482
Reviewed-on: https://gerrit.hackerspace.pl/c/hscloud/+/1490
Reviewed-by: q3k <q3k@hackerspace.pl>
2023-03-31 22:53:59 +00:00
implr 779727b39e machines/bc01n05: postgres: auth, hba, more ram
Change-Id: Id10b97efa3588a2a9147a349391da559e6cce7e5
Reviewed-on: https://gerrit.hackerspace.pl/c/hscloud/+/1482
Reviewed-by: q3k <q3k@hackerspace.pl>
2023-03-28 21:22:50 +00:00
implr 3b0887397a machines/bc01n05: postgres tuning
Change-Id: I30925a84216b45bde9e92b67b007f15b2cdf58e8
Reviewed-on: https://gerrit.hackerspace.pl/c/hscloud/+/1481
Reviewed-by: q3k <q3k@hackerspace.pl>
2023-03-26 12:16:20 +00:00
implr 821b839b16 machines/bc01n05: zfsify; initial postgres
Change-Id: I355ac4aa3c56a1e6a564b7a3c7cfc4e67b072dae
Reviewed-on: https://gerrit.hackerspace.pl/c/hscloud/+/1470
Reviewed-by: q3k <q3k@hackerspace.pl>
2023-03-11 21:33:14 +00:00
implr 3320155d23 cluster/machines/base: enable microcode loading
This will happen at next boot via early microcode - no risk to currently
running processes.

Change-Id: I88553fa9a1350ebb80aaf978e29e8f1156783a2c
Reviewed-on: https://gerrit.hackerspace.pl/c/hscloud/+/1469
Reviewed-by: q3k <q3k@hackerspace.pl>
2023-03-11 21:33:05 +00:00
q3k 712a5dc3e3 cluster: add bc01n05.hswaw.net
This will be our postgres pet machine.

Change-Id: Ifff6648394ca6407fb5b5daa853f4abc42541703
Reviewed-on: https://gerrit.hackerspace.pl/c/hscloud/+/1467
Reviewed-by: q3k <q3k@hackerspace.pl>
2023-03-04 22:26:46 +00:00
q3k 3a9562ecfd cluster: k0: remove native ceph
After installing HBJ11s and spreading out the mons we're going full
Rook.

Change-Id: Ia00cbe953548f06cf27343371fc67890619c8262
Reviewed-on: https://gerrit.hackerspace.pl/c/hscloud/+/1466
Reviewed-by: q3k <q3k@hackerspace.pl>
2023-03-04 22:26:39 +00:00
q3k ef3aab6a14 k0: host os bump wip
This bumps it on bc01n01, but nowhere else yet.

We have to vendor some more kubelet bits unfortunately.

Change-Id: Ifb169dd9c2c19d60f88d946d065d4446141601b1
Reviewed-on: https://gerrit.hackerspace.pl/c/hscloud/+/1465
Reviewed-by: implr <implr@hackerspace.pl>
2023-03-04 22:26:14 +00:00
patryk a2bcfeaf0b cluster: bump vm.max_map_count sysctl tunable to a higher value
This is needed for running some memory-intensive workloads, like
ElasticSearch/OpenSearch.

Change-Id: I7b00ec5faca73ec69bdbf1ca41c025d7efeae55c
Reviewed-on: https://gerrit.hackerspace.pl/c/hscloud/+/1443
Reviewed-by: implr <implr@hackerspace.pl>
2022-12-11 20:28:51 +00:00
q3k 55a486ae49 cluster: refactor nix machinery to fit //ops
This is a chonky refactor that get rids of the previous cluster-centric
defs-* plain nix file setup.

Now, nodes are configured individually in plain nixos modules, and are
provided a view of all other nodes in the 'machines' attribute. Cluster
logic is moved into modules which inspect this array to find other nodes
within the same cluster.

Kubernetes options are not fully clusterified yet (ie., they are still
hardcode to only provide the 'k0' cluster) but that can be fixed later.
The Ceph machinery is a good example of how that can be done.

The new NixOS configs are zero-diff against prod. While this is done
mostly by keeping the logic, we had to keep a few newly discovered
'bugs' around by adding some temporary options which keeps things as they
are. These will be removed in a future CL, then introducing a diff (but
no functional changes, hopefully).

We also remove the nix eval from clustercfg as it was not used anymore
(basically since we refactored certs at some point).

Change-Id: Id79772a96249b0e6344046f96f9c2cb481c4e1f4
Reviewed-on: https://gerrit.hackerspace.pl/c/hscloud/+/1322
Reviewed-by: informatic <informatic@hackerspace.pl>
2022-06-19 11:48:52 +00:00