4
0
Fork 2
mirror of https://gerrit.hackerspace.pl/hscloud synced 2025-01-16 22:13:53 +00:00
Commit graph

26 commits

Author SHA1 Message Date
e433c3c929 cluster/machines/dcr03s16: tapes and tape accessories
Change-Id: Ib93fd85d0b09177d6e29bc3b4d68b999a1db3eaa
Reviewed-on: https://gerrit.hackerspace.pl/c/hscloud/+/1994
Reviewed-by: q3k <q3k@hackerspace.pl>
2024-10-19 08:43:50 +00:00
15e7348a0b cluster: remove dead machines
Change-Id: I3ff6680bc7212341ca626b0f560e1fe93efe3a35
Reviewed-on: https://gerrit.hackerspace.pl/c/hscloud/+/1987
Reviewed-by: ar <ar@hackerspace.pl>
2024-07-20 12:18:00 +00:00
de83f4904f cluster/machines: replace disk in dcr01s22
Change-Id: I22fefc9ff68295e33ab0a1f26ab2aeb02fb75210
Reviewed-on: https://gerrit.hackerspace.pl/c/hscloud/+/1854
Reviewed-by: q3k <q3k@hackerspace.pl>
Reviewed-by: implr <implr@hackerspace.pl>
2024-01-24 18:51:09 +00:00
a84e9bb884 cluster/machines: replace disk in dcr01s24
Change-Id: I144f23c571267543568a1bd132aea5a8a75db8f2
Reviewed-on: https://gerrit.hackerspace.pl/c/hscloud/+/1853
Reviewed-by: q3k <q3k@hackerspace.pl>
Reviewed-by: implr <implr@hackerspace.pl>
2024-01-24 18:51:09 +00:00
ac4f99e2e1 cluster/machines/dcr01s24: pivot to lvm root and efi boot
Change-Id: I2df08a0ff7366607781421e6fe8c0ddce86e57a5
Reviewed-on: https://gerrit.hackerspace.pl/c/hscloud/+/1781
Reviewed-by: q3k <q3k@hackerspace.pl>
2023-11-12 19:36:25 +00:00
f47d359a28 cluster/machines/dcr01s22: pivot to mirrored efi boot
Change-Id: I673bad18915ee76e0f35c56e689345f360d295dc
Reviewed-on: https://gerrit.hackerspace.pl/c/hscloud/+/1771
Reviewed-by: q3k <q3k@hackerspace.pl>
2023-11-12 19:36:25 +00:00
b8ccfa8459 cluster/machines: move common LVM support bits into base.nix
Change-Id: I13e5653241a8245bae67cc7e660312484f1dcaca
Reviewed-on: https://gerrit.hackerspace.pl/c/hscloud/+/1767
Reviewed-by: q3k <q3k@hackerspace.pl>
2023-11-12 01:31:39 +00:00
8edc52e619 c/m/dcr01s22: pivot to lvm root
The bootloader is *not* moved yet, machine still boots off the old disk

Change-Id: I8cc92489bb06bfe9581d68503237e08fa8082c7c
Reviewed-on: https://gerrit.hackerspace.pl/c/hscloud/+/1766
Reviewed-by: q3k <q3k@hackerspace.pl>
2023-11-12 01:30:42 +00:00
b37b70cbd4 cluster/m/m/base: chronyd: enable rtc sync, aggresively step
Change-Id: I61827ec2c77e79ce3e394eb2574372d3c21394d8
Reviewed-on: https://gerrit.hackerspace.pl/c/hscloud/+/1765
Reviewed-by: q3k <q3k@hackerspace.pl>
2023-11-12 01:30:42 +00:00
4d3a0cc123 cluster/kube-common: avoid full nixpkgs checkouts
fetchGit was unnecessarily fetching full nixpkgs repository during
evaluation.

Change-Id: Ia22a234938014659d4c33e16c5028a63884d476c
Reviewed-on: https://gerrit.hackerspace.pl/c/hscloud/+/1728
Reviewed-by: q3k <q3k@hackerspace.pl>
2023-11-06 21:55:24 +00:00
c783390cf5 cluster/m/m/base: add a bunch of utilities to systemPackages
Change-Id: I8ad61f925011d019b8ef868013fcb266947a9c94
Reviewed-on: https://gerrit.hackerspace.pl/c/hscloud/+/1755
Reviewed-by: q3k <q3k@hackerspace.pl>
2023-11-01 23:12:07 +00:00
a5ba554446 k0: enable fstrim, lower gc thresh for kubelet
fstrim is nice as it might prevent us from killing SSDs so fast.

A lower GC threshold for kubelet is nice as we run non-kubelet services
on these nodes, and they need their space. Notably, Ceph's mons tend to
be extremely claustrophobic, firing alerts at 70% disk usage or so.

Change-Id: I94c1787e62f82a02f107d04a87575327d3d79c01
Reviewed-on: https://gerrit.hackerspace.pl/c/hscloud/+/1724
Reviewed-by: implr <implr@hackerspace.pl>
2023-10-13 11:47:36 +00:00
43b6db895d k0: fully disable kube control/data plane on bc01n01,n02
Change-Id: I103f41059d75aa6b3ce318fd6f863f50ad013160
Reviewed-on: https://gerrit.hackerspace.pl/c/hscloud/+/1697
Reviewed-by: implr <implr@hackerspace.pl>
2023-10-09 23:32:26 +00:00
bae9499880 cluster/machines: enable controlplane on dcr03s16, disable on bc01n01
Change-Id: I199f66ac60c522c29fe4900702eb9eed48749cfe
Reviewed-on: https://gerrit.hackerspace.pl/c/hscloud/+/1692
Reviewed-by: q3k <q3k@hackerspace.pl>
2023-10-09 19:10:19 +00:00
9a88f28805 cluster/{machines,certs}: add dcr03s16.hswaw.net
Also make dataplane-only nodes actually work:
- make kubeproxy use the same package as kubelet
- disable firewall

Change-Id: I7babbb749656e6f75151c8eda6e3f09f3c6bff5f
Reviewed-on: https://gerrit.hackerspace.pl/c/hscloud/+/1686
Reviewed-by: q3k <q3k@hackerspace.pl>
2023-10-09 19:02:18 +00:00
9251121fa9 cluster/certs: remove old kube CA
This completes the migration away from the old CA/cert infrastructure.

The tool which was used to generate all these certs will come next. It's
effectively a reimplementation of clustercfg in Go.

We also removed the unused kube-serviceaccounts cert, which was
generated by the old tooling for no good reason (we only need a key for
service accounts, not an actual cert...).

Change-Id: Ied9e5d8fc90c64a6b4b9fdd20c33981410c884b4
Reviewed-on: https://gerrit.hackerspace.pl/c/hscloud/+/1501
Reviewed-by: q3k <q3k@hackerspace.pl>
2023-04-01 13:55:18 +00:00
bbc5a43d77 cluster: move kubernetes services to temporary CA bundle
This is already deployed, and it allows Kubernetes components
(temporary) freedom to use the old or new CA cert.

Change-Id: I8ac7f773a333c30fa22902b8edc327c0c700a482
Reviewed-on: https://gerrit.hackerspace.pl/c/hscloud/+/1490
Reviewed-by: q3k <q3k@hackerspace.pl>
2023-03-31 22:53:59 +00:00
779727b39e machines/bc01n05: postgres: auth, hba, more ram
Change-Id: Id10b97efa3588a2a9147a349391da559e6cce7e5
Reviewed-on: https://gerrit.hackerspace.pl/c/hscloud/+/1482
Reviewed-by: q3k <q3k@hackerspace.pl>
2023-03-28 21:22:50 +00:00
3b0887397a machines/bc01n05: postgres tuning
Change-Id: I30925a84216b45bde9e92b67b007f15b2cdf58e8
Reviewed-on: https://gerrit.hackerspace.pl/c/hscloud/+/1481
Reviewed-by: q3k <q3k@hackerspace.pl>
2023-03-26 12:16:20 +00:00
821b839b16 machines/bc01n05: zfsify; initial postgres
Change-Id: I355ac4aa3c56a1e6a564b7a3c7cfc4e67b072dae
Reviewed-on: https://gerrit.hackerspace.pl/c/hscloud/+/1470
Reviewed-by: q3k <q3k@hackerspace.pl>
2023-03-11 21:33:14 +00:00
3320155d23 cluster/machines/base: enable microcode loading
This will happen at next boot via early microcode - no risk to currently
running processes.

Change-Id: I88553fa9a1350ebb80aaf978e29e8f1156783a2c
Reviewed-on: https://gerrit.hackerspace.pl/c/hscloud/+/1469
Reviewed-by: q3k <q3k@hackerspace.pl>
2023-03-11 21:33:05 +00:00
712a5dc3e3 cluster: add bc01n05.hswaw.net
This will be our postgres pet machine.

Change-Id: Ifff6648394ca6407fb5b5daa853f4abc42541703
Reviewed-on: https://gerrit.hackerspace.pl/c/hscloud/+/1467
Reviewed-by: q3k <q3k@hackerspace.pl>
2023-03-04 22:26:46 +00:00
3a9562ecfd cluster: k0: remove native ceph
After installing HBJ11s and spreading out the mons we're going full
Rook.

Change-Id: Ia00cbe953548f06cf27343371fc67890619c8262
Reviewed-on: https://gerrit.hackerspace.pl/c/hscloud/+/1466
Reviewed-by: q3k <q3k@hackerspace.pl>
2023-03-04 22:26:39 +00:00
ef3aab6a14 k0: host os bump wip
This bumps it on bc01n01, but nowhere else yet.

We have to vendor some more kubelet bits unfortunately.

Change-Id: Ifb169dd9c2c19d60f88d946d065d4446141601b1
Reviewed-on: https://gerrit.hackerspace.pl/c/hscloud/+/1465
Reviewed-by: implr <implr@hackerspace.pl>
2023-03-04 22:26:14 +00:00
a2bcfeaf0b cluster: bump vm.max_map_count sysctl tunable to a higher value
This is needed for running some memory-intensive workloads, like
ElasticSearch/OpenSearch.

Change-Id: I7b00ec5faca73ec69bdbf1ca41c025d7efeae55c
Reviewed-on: https://gerrit.hackerspace.pl/c/hscloud/+/1443
Reviewed-by: implr <implr@hackerspace.pl>
2022-12-11 20:28:51 +00:00
55a486ae49 cluster: refactor nix machinery to fit //ops
This is a chonky refactor that get rids of the previous cluster-centric
defs-* plain nix file setup.

Now, nodes are configured individually in plain nixos modules, and are
provided a view of all other nodes in the 'machines' attribute. Cluster
logic is moved into modules which inspect this array to find other nodes
within the same cluster.

Kubernetes options are not fully clusterified yet (ie., they are still
hardcode to only provide the 'k0' cluster) but that can be fixed later.
The Ceph machinery is a good example of how that can be done.

The new NixOS configs are zero-diff against prod. While this is done
mostly by keeping the logic, we had to keep a few newly discovered
'bugs' around by adding some temporary options which keeps things as they
are. These will be removed in a future CL, then introducing a diff (but
no functional changes, hopefully).

We also remove the nix eval from clustercfg as it was not used anymore
(basically since we refactored certs at some point).

Change-Id: Id79772a96249b0e6344046f96f9c2cb481c4e1f4
Reviewed-on: https://gerrit.hackerspace.pl/c/hscloud/+/1322
Reviewed-by: informatic <informatic@hackerspace.pl>
2022-06-19 11:48:52 +00:00