hscloud/ops
Serge Bazanski 9f0e1e88f1 cluster/clustercfg: rewrite it in Go
This replaces the old clustercfg script with a brand spanking new
mostly-equivalent Go reimplementation. But it's not exactly the same,
here are the differences:

 1. No cluster deployment logic anymore - we expect everyone to use ops/
    machine at this point.
 2. All certs/keys are Ed25519 and do not expire by default - but
    support for short-lived certificates is there, and is actually more
    generic and reusable. Currently it's only used for admincreds.
 3. Speaking of admincreds: the new admincreds automatically figure out
    your username.
 4. admincreds also doesn't shell out to kubectl anymore, and doesn't
    override your default context. The generated creds can live
    peacefully alongside your normal prodaccess creds.
 5. gencerts (the new nodestrap without deployment support) now
    automatically generates certs for all nodes, based on local Nix
    modules in ops/.
 6. No secretstore support. This will be changed once we rebuild
    secretstore in Go. For now users are expected to manually run
    secretstore sync on cluster/secrets.

Change-Id: Ida935f44e04fd933df125905eee10121ac078495
Reviewed-on: https://gerrit.hackerspace.pl/c/hscloud/+/1498
Reviewed-by: q3k <q3k@hackerspace.pl>
2023-06-19 22:23:52 +00:00
..
ceph cluster: deploy NixOS-based ceph 2021-09-11 20:33:24 +00:00
monitoring cluster/clustercfg: rewrite it in Go 2023-06-19 22:23:52 +00:00
sso/kube *: post-certmanager manifests update 2023-06-19 21:20:44 +00:00
exports.nix cluster/clustercfg: rewrite it in Go 2023-06-19 22:23:52 +00:00
machines.nix ops: repin cluster machines to older nixpkgs checkout 2023-03-31 22:53:59 +00:00
provision.nix ops, cluster: consolidate NixOS provisioning 2021-09-10 23:55:52 +00:00
README.md hswaw/machines: add tv1, larrythebuilder 2022-07-06 19:49:37 +00:00

Operations

Deploying NixOS machines

Machine configurations are in ops/machines.nix.

Wrapper script to show all available machines and provision a single machine:

 $ $(nix-build -A ops.provision)
 Available machines:
  - bc01n01.hswaw.net
  - bc01n02.hswaw.net
  - dcr01s22.hswaw.net
  - dcr01s24.hswaw.net
  - edge01.waw.bgp.wtf

 $ $(nix-build -A ops.provision) edge01.waw.bgp.wtf

This can be slow, as it evaluates/builds all machines' configs. If you just want to deploy one machine and possible iterate faster:

$ $(nix-build -A 'ops.machines."edge01.waw.bgp.wtf".config.passthru.hscloud.provision')

Remote Builders (cross-compiling)

If you're attempting to deploy a machine which has a system architecture other than your host machine (eg. are deploying an Aarch64 Raspberry Pi4 from an Intel machine), you'll need to use a remote builder which has that target architecture.

Any machine of that target architecture running Nix(OS) will do, even the machine you're deploing. But we also have some dedicated build machines:

Name Architecture CPUs RAM
larrythebuilder.q3k.org AArch64 4 24GiB

To use a machine $name as a remote builder:

  1. Make sure you have access to the machine. ssh $username@$name should work. If not, file a CR to get your key added to the machine and ask someone to review and deploy it. The machines' key confiurations are in hscloud.

  2. Check nix store ping --store ssh-ng://$username@$name. It should work.

  3. On NixOS, configure builders in your system configuration.nix and rebuild, eg.:

nix.buildMachines = [
  {
    system = "aarch64-linux";
    sshUser = "root";
    sshKey = "/home/q3k/.ssh/id_ed25519";
    maxJobs = 4;
    hostName = "larrythebuilder.q3k.org";
  }
];
nix.distributedBuilds = true;
  1. On non-NixOS, configure builders in your nix.conf, eg. builders = ssh://$username@$name aarch64-linux in your system/user nix.conf. Your nix-daemon should also specify that the local user is trusted.

We should automate this some day.