Instead of waiting for backports or even rolling forward unstable, let's
just patch the bug out.
This has been deployed on:
- dcr01s22.hswaw.net
- dcr01s24.hswaw.net
- dcr03s16.hswaw.net
- snowflake.hswaw.net
Change-Id: I0ad8ea37bd15bc9bd4e814cdf3eda7b2c47bb03e
Reviewed-on: https://gerrit.hackerspace.pl/c/hscloud/+/1988
Reviewed-by: implr <implr@hackerspace.pl>
prep for postgresql database migration from the instance running on old
dell blade server.
on snowflake side, mostly a copy-paste of configuration from bc01n05,
from which the database instance will be migrated from, with a few
adjustments for newer nixpkgs/nixos.
on matrix/k8s side, just a change of host.
and a drive-by rename from `.hackerspace.pl` to `.hswaw.net`
Change-Id: I0e78162270ebb3244078e34dee0cd4629d5598ca
Reviewed-on: https://gerrit.hackerspace.pl/c/hscloud/+/1986
Reviewed-by: q3k <q3k@hackerspace.pl>
This adds one of the 4 new fast machines that will run various one-off
workloads, initially mostly migrated off of the old dell m1000e blade
chassis, such as a virtualized boston-packets.
Change-Id: I4a85f8e14cd79257ad41bbe1519f33595f4e497a
Reviewed-on: https://gerrit.hackerspace.pl/c/hscloud/+/1981
Reviewed-by: q3k <q3k@hackerspace.pl>
I know the comments are wrong, I'll clean them up once we get rid of the
old nixpkgs fetch completely.
Change-Id: Ia64d2d0908fc834cb976afbb415c8d1283433a38
Reviewed-on: https://gerrit.hackerspace.pl/c/hscloud/+/1865
Reviewed-by: q3k <q3k@hackerspace.pl>
This adds automatic scraping of pods and services based on presence of
annotations:
- prometheus.io/scrape
- prometheus.io/port
- prometheus.io/path
Change-Id: I1c1afecc75c30278889de1f6ca0b17da69997295
Reviewed-on: https://gerrit.hackerspace.pl/c/hscloud/+/1850
Reviewed-by: implr <implr@hackerspace.pl>
Rename `target_service` to `target` to mirror Service's `target`; rename `extra_paths` to `extraPaths` to follow the camelCase convention used everywhere except for a few places in kube.upstream (assumed to be a mistake)
Change-Id: Icfcb70ef889e3359bf0391c465034817f4b70cce
Reviewed-on: https://gerrit.hackerspace.pl/c/hscloud/+/1809
Reviewed-by: q3k <q3k@hackerspace.pl>
Introduce a convention of declaring a secretsRefs:: object below cfg:: for containing all secretKeyRefs. The goal is to self-document all secrets that need to be created in order to deploy a service
Change-Id: I3a990d54f65a288f5e748262c576d2a120efd815
Reviewed-on: https://gerrit.hackerspace.pl/c/hscloud/+/1806
Reviewed-by: q3k <q3k@hackerspace.pl>
A convention is introduced to specify `local top = self` declaration at the top of an app/service/component's jsonnet, representing the top-level object. Reasoning is as following:
- `top` is more universal/unambiguous than `app`
- `top` is usually shorter than $NAME
- a conventional `top` instead of $NAME (coupled with other conventions introduced) makes app jsonnets wonderfully copy-paste'able, aiding in learning and quickly building
Change-Id: I7ece83ce7e97021ad98a6abb3500fb9839936811
Reviewed-on: https://gerrit.hackerspace.pl/c/hscloud/+/1805
Reviewed-by: q3k <q3k@hackerspace.pl>
A convention is introduced to specify the kube.Namespace object in a deployment as a `local ns` instead of an `ns:` or a `namespace:` for these reasons:
- non-cluster admins cannot create new namespaces, and we've been moving in the direction of specifying objects that require cluster admin permissions to apply (policies, role bindings) in //cluster/kube/k0 instead of in the app jsonnet
- namespace admins CAN delete the namespace, making `kubecfg delete` unexpectedly dangerous (especially if a namespace contains more than just the contents of the file being applied - common with personal namespaces)
- `.Contain()` is a common operation, and it shows up in lines that are pretty long, so `ns.Contain()` is preferable to `app.ns.Contain()` or `service.namespace.Contain()`
Change-Id: Ie4ea825376dbf6faa175179054f3ee3de2253ae0
Reviewed-on: https://gerrit.hackerspace.pl/c/hscloud/+/1804
Reviewed-by: q3k <q3k@hackerspace.pl>
There's no difference as far as jsonnet is concerned, but it may confuse newbies, as Service and SimpleIngress use double colon for its top-level kube helpers. This also removes any ambiguity as to whether this is manifested in final JSON. So we can make that a convention.
Change-Id: I01ad4ea63f4d5d8ee6e5d41c79637ba186548c6f
Reviewed-on: https://gerrit.hackerspace.pl/c/hscloud/+/1803
Reviewed-by: q3k <q3k@hackerspace.pl>
Also make dataplane-only nodes actually work:
- make kubeproxy use the same package as kubelet
- disable firewall
Change-Id: I7babbb749656e6f75151c8eda6e3f09f3c6bff5f
Reviewed-on: https://gerrit.hackerspace.pl/c/hscloud/+/1686
Reviewed-by: q3k <q3k@hackerspace.pl>
This replaces the old clustercfg script with a brand spanking new
mostly-equivalent Go reimplementation. But it's not exactly the same,
here are the differences:
1. No cluster deployment logic anymore - we expect everyone to use ops/
machine at this point.
2. All certs/keys are Ed25519 and do not expire by default - but
support for short-lived certificates is there, and is actually more
generic and reusable. Currently it's only used for admincreds.
3. Speaking of admincreds: the new admincreds automatically figure out
your username.
4. admincreds also doesn't shell out to kubectl anymore, and doesn't
override your default context. The generated creds can live
peacefully alongside your normal prodaccess creds.
5. gencerts (the new nodestrap without deployment support) now
automatically generates certs for all nodes, based on local Nix
modules in ops/.
6. No secretstore support. This will be changed once we rebuild
secretstore in Go. For now users are expected to manually run
secretstore sync on cluster/secrets.
Change-Id: Ida935f44e04fd933df125905eee10121ac078495
Reviewed-on: https://gerrit.hackerspace.pl/c/hscloud/+/1498
Reviewed-by: q3k <q3k@hackerspace.pl>
We accidentally bumped nixpkgs at https://gerrit.hackerspace.pl/1441 and
forgot to upgrade it. We don't wanna upgrade it right now.
This doesn't give us back a zero-diff, but it's close enough.
Change-Id: I1a9f50df88e564cd4de76f67adfaa1e88a746f2e
Reviewed-on: https://gerrit.hackerspace.pl/c/hscloud/+/1471
Reviewed-by: patryk <patryk@hackerspace.pl>
This will be our postgres pet machine.
Change-Id: Ifff6648394ca6407fb5b5daa853f4abc42541703
Reviewed-on: https://gerrit.hackerspace.pl/c/hscloud/+/1467
Reviewed-by: q3k <q3k@hackerspace.pl>
After installing HBJ11s and spreading out the mons we're going full
Rook.
Change-Id: Ia00cbe953548f06cf27343371fc67890619c8262
Reviewed-on: https://gerrit.hackerspace.pl/c/hscloud/+/1466
Reviewed-by: q3k <q3k@hackerspace.pl>
This bumps it on bc01n01, but nowhere else yet.
We have to vendor some more kubelet bits unfortunately.
Change-Id: Ifb169dd9c2c19d60f88d946d065d4446141601b1
Reviewed-on: https://gerrit.hackerspace.pl/c/hscloud/+/1465
Reviewed-by: implr <implr@hackerspace.pl>
This ensures, for example, that the packets are for the correct
architecture.
Change-Id: If17c307fbad02ee72c6dd21a874c59514415ab2e
Reviewed-on: https://gerrit.hackerspace.pl/c/hscloud/+/1334
Reviewed-by: implr <implr@hackerspace.pl>
This adds two brand new AArch64 machines: a generic builder (and
instructions on how to use it) and tv1.waw, an RPi4 acting as digital
signage in the space.
Change-Id: I8d38344ec35f99f4b872cf9526f6e6771fbffc43
Reviewed-on: https://gerrit.hackerspace.pl/c/hscloud/+/1330
Reviewed-by: informatic <informatic@hackerspace.pl>
This is a chonky refactor that get rids of the previous cluster-centric
defs-* plain nix file setup.
Now, nodes are configured individually in plain nixos modules, and are
provided a view of all other nodes in the 'machines' attribute. Cluster
logic is moved into modules which inspect this array to find other nodes
within the same cluster.
Kubernetes options are not fully clusterified yet (ie., they are still
hardcode to only provide the 'k0' cluster) but that can be fixed later.
The Ceph machinery is a good example of how that can be done.
The new NixOS configs are zero-diff against prod. While this is done
mostly by keeping the logic, we had to keep a few newly discovered
'bugs' around by adding some temporary options which keeps things as they
are. These will be removed in a future CL, then introducing a diff (but
no functional changes, hopefully).
We also remove the nix eval from clustercfg as it was not used anymore
(basically since we refactored certs at some point).
Change-Id: Id79772a96249b0e6344046f96f9c2cb481c4e1f4
Reviewed-on: https://gerrit.hackerspace.pl/c/hscloud/+/1322
Reviewed-by: informatic <informatic@hackerspace.pl>
This makes the hscloud readTree object available as following in NixOS
modules:
{ config, pkgs, workspace, ... }: {
environment.systemPackages = [
workspace.hswaw.laserproxy
];
}
Change-Id: I9c8146f5156ffe5d06cb8408a2ce632657990d59
Reviewed-on: https://gerrit.hackerspace.pl/c/hscloud/+/1164
Reviewed-by: q3k <q3k@hackerspace.pl>
First pass at a non-rook-managed Ceph cluster. We call it k0 instead of
ceph-waw4, as we pretty much are sure now that we will always have a
one-kube-cluster-to-one-ceph-cluster correspondence, with different Ceph
pools for different media kinds (if at all).
For now this has one mon and spinning rust OSDs. This can be iterated on
to make it less terrible with time.
See b/6 for more details.
Change-Id: Ie502a232c700af93f33fcad9fa1c57058161aa11
This moves the diff-and-activate logic from cluster/nix/provision.nix
into ops/{provision,machines}.nix that can be used for both cluster
machines and bgpwtf machines.
The provisioning scripts now live per-NixOS-config, and anything under
ops.machines.$fqdn now has a .passthru.hscloud.provision derivation
which is that script. When ran, it will attempt to deploy onto the
target machine.
There's also a top-level tool at `ops.provision` which builds all
configurations / machines and can be called with the machine name/fqdn
to call the corresponding provisioner script.
clustercfg is changed to use the new provisioning logic.
Change-Id: I258abce9e8e3db42af35af102f32ab7963046353
This is necessary for the NixOS EFI boot machinery to pick up the new
derivation when switching to it, otherwise the machine will not boot
into the newly switched configuration.
Change-Id: I8b18956d2afeea09c38462f09a00c345cf86f80d
Stopgap until we finish b/3, need to deploy some changes on it without
rebooting into newer nixpkgs.
Change-Id: Ic2690dfcb398a419338961c8fcbc7e604298977a