hscloud

Author	SHA1	Message	Date
Bartosz Stebel	9a88f28805	cluster/{machines,certs}: add dcr03s16.hswaw.net Also make dataplane-only nodes actually work: - make kubeproxy use the same package as kubelet - disable firewall Change-Id: I7babbb749656e6f75151c8eda6e3f09f3c6bff5f Reviewed-on: https://gerrit.hackerspace.pl/c/hscloud/+/1686 Reviewed-by: q3k <q3k@hackerspace.pl>	2023-10-09 19:02:18 +00:00
Serge Bazanski	9f0e1e88f1	cluster/clustercfg: rewrite it in Go This replaces the old clustercfg script with a brand spanking new mostly-equivalent Go reimplementation. But it's not exactly the same, here are the differences: 1. No cluster deployment logic anymore - we expect everyone to use ops/ machine at this point. 2. All certs/keys are Ed25519 and do not expire by default - but support for short-lived certificates is there, and is actually more generic and reusable. Currently it's only used for admincreds. 3. Speaking of admincreds: the new admincreds automatically figure out your username. 4. admincreds also doesn't shell out to kubectl anymore, and doesn't override your default context. The generated creds can live peacefully alongside your normal prodaccess creds. 5. gencerts (the new nodestrap without deployment support) now automatically generates certs for all nodes, based on local Nix modules in ops/. 6. No secretstore support. This will be changed once we rebuild secretstore in Go. For now users are expected to manually run secretstore sync on cluster/secrets. Change-Id: Ida935f44e04fd933df125905eee10121ac078495 Reviewed-on: https://gerrit.hackerspace.pl/c/hscloud/+/1498 Reviewed-by: q3k <q3k@hackerspace.pl>	2023-06-19 22:23:52 +00:00
Piotr Dobrowolski	7e841065b0	*: post-certmanager manifests update Change-Id: I745c850268c31777c5722a9833c8152a55615aed Reviewed-on: https://gerrit.hackerspace.pl/c/hscloud/+/1512 Reviewed-by: q3k <q3k@hackerspace.pl>	2023-06-19 21:20:44 +00:00
Serge Bazanski	f6e6abb0f5	ops: repin cluster machines to older nixpkgs checkout Change-Id: I592c689e33d81c131d389d87153900165aac19e5 Reviewed-on: https://gerrit.hackerspace.pl/c/hscloud/+/1486 Reviewed-by: q3k <q3k@hackerspace.pl>	2023-03-31 22:53:59 +00:00
Serge Bazanski	8f0842341a	ops: repin edge01.waw to old nixpkgs We accidentally bumped nixpkgs at https://gerrit.hackerspace.pl/1441 and forgot to upgrade it. We don't wanna upgrade it right now. This doesn't give us back a zero-diff, but it's close enough. Change-Id: I1a9f50df88e564cd4de76f67adfaa1e88a746f2e Reviewed-on: https://gerrit.hackerspace.pl/c/hscloud/+/1471 Reviewed-by: patryk <patryk@hackerspace.pl>	2023-03-10 20:17:15 +00:00
Serge Bazanski	712a5dc3e3	cluster: add bc01n05.hswaw.net This will be our postgres pet machine. Change-Id: Ifff6648394ca6407fb5b5daa853f4abc42541703 Reviewed-on: https://gerrit.hackerspace.pl/c/hscloud/+/1467 Reviewed-by: q3k <q3k@hackerspace.pl>	2023-03-04 22:26:46 +00:00
Serge Bazanski	3a9562ecfd	cluster: k0: remove native ceph After installing HBJ11s and spreading out the mons we're going full Rook. Change-Id: Ia00cbe953548f06cf27343371fc67890619c8262 Reviewed-on: https://gerrit.hackerspace.pl/c/hscloud/+/1466 Reviewed-by: q3k <q3k@hackerspace.pl>	2023-03-04 22:26:39 +00:00
Serge Bazanski	ef3aab6a14	k0: host os bump wip This bumps it on bc01n01, but nowhere else yet. We have to vendor some more kubelet bits unfortunately. Change-Id: Ifb169dd9c2c19d60f88d946d065d4446141601b1 Reviewed-on: https://gerrit.hackerspace.pl/c/hscloud/+/1465 Reviewed-by: implr <implr@hackerspace.pl>	2023-03-04 22:26:14 +00:00
vuko	deeeff861e	hswaw/machines: add sound.waw.hackerspace.pl Change-Id: Id0e6a02d9ae4cf61d758713a99d21c6da0c72b66 Reviewed-on: https://gerrit.hackerspace.pl/c/hscloud/+/1401 Reviewed-by: vuko <vuko@hackerspace.pl> Reviewed-by: informatic <informatic@hackerspace.pl>	2022-10-09 19:35:18 +00:00
Serge Bazanski	957d91180a	bgpwtf: edge01: bump nixpkgs, use networkd Change-Id: I038f9518e090aecc90f464475f29c5b3c1570eff Reviewed-on: https://gerrit.hackerspace.pl/c/hscloud/+/1339 Reviewed-by: implr <implr@hackerspace.pl>	2022-07-07 23:51:57 +00:00
Serge Bazanski	c35ea6a220	ops: inject the machine's pkgs into the machine's hscloud tree This ensures, for example, that the packets are for the correct architecture. Change-Id: If17c307fbad02ee72c6dd21a874c59514415ab2e Reviewed-on: https://gerrit.hackerspace.pl/c/hscloud/+/1334 Reviewed-by: implr <implr@hackerspace.pl>	2022-07-07 18:10:40 +00:00
Serge Bazanski	dcdbd8425c	hswaw/machines: add tv2 Change-Id: I657c316bcc663c79b6886d5843b9de5cbf17f1c3 Reviewed-on: https://gerrit.hackerspace.pl/c/hscloud/+/1333 Reviewed-by: informatic <informatic@hackerspace.pl>	2022-07-07 18:07:18 +00:00
Serge Bazanski	5ac5e4bec3	hswaw/machines: add tv1, larrythebuilder This adds two brand new AArch64 machines: a generic builder (and instructions on how to use it) and tv1.waw, an RPi4 acting as digital signage in the space. Change-Id: I8d38344ec35f99f4b872cf9526f6e6771fbffc43 Reviewed-on: https://gerrit.hackerspace.pl/c/hscloud/+/1330 Reviewed-by: informatic <informatic@hackerspace.pl>	2022-07-06 19:49:37 +00:00
Serge Bazanski	55a486ae49	cluster: refactor nix machinery to fit //ops This is a chonky refactor that get rids of the previous cluster-centric defs-* plain nix file setup. Now, nodes are configured individually in plain nixos modules, and are provided a view of all other nodes in the 'machines' attribute. Cluster logic is moved into modules which inspect this array to find other nodes within the same cluster. Kubernetes options are not fully clusterified yet (ie., they are still hardcode to only provide the 'k0' cluster) but that can be fixed later. The Ceph machinery is a good example of how that can be done. The new NixOS configs are zero-diff against prod. While this is done mostly by keeping the logic, we had to keep a few newly discovered 'bugs' around by adding some temporary options which keeps things as they are. These will be removed in a future CL, then introducing a diff (but no functional changes, hopefully). We also remove the nix eval from clustercfg as it was not used anymore (basically since we refactored certs at some point). Change-Id: Id79772a96249b0e6344046f96f9c2cb481c4e1f4 Reviewed-on: https://gerrit.hackerspace.pl/c/hscloud/+/1322 Reviewed-by: informatic <informatic@hackerspace.pl>	2022-06-19 11:48:52 +00:00
Piotr Dobrowolski	a13208bf9b	ops/sso: bump to latest version, roll out RSA JWT signing Bump to: https://code.hackerspace.pl/informatic/sso-v2/commit/?id=682322c98063c596d2e46f1e7844551c5a7226db This introduces (and enables) support for RSA id_tokens (that are required by oauth2_proxy for example) and fixes/improves handling of non-active members. Change-Id: Ia7d5e5ca7a2769f11f6190add78114e3b6141c6e Reviewed-on: https://gerrit.hackerspace.pl/c/hscloud/+/1304 Reviewed-by: q3k <q3k@hackerspace.pl>	2022-05-01 08:17:57 +00:00
Piotr Dobrowolski	b6bc3e69b9	hswaw/machines/customs: upgrade to workspace nixos-unstable 2021-08-11 Change-Id: I6eb4408d40e14f24ebbe3f9f3aef0be952b44e8b Reviewed-on: https://gerrit.hackerspace.pl/c/hscloud/+/1167 Reviewed-by: vuko <vuko@hackerspace.pl>	2021-10-20 20:58:16 +00:00
Piotr Dobrowolski	a01905ae64	hswaw/machines/customs: check in code.hackerspace.pl/vuko/customs Change-Id: Ic698cce2ef0060a54b195cf90574696b8be1eb0f Reviewed-on: https://gerrit.hackerspace.pl/c/hscloud/+/1162 Reviewed-by: informatic <informatic@hackerspace.pl>	2021-10-20 20:58:16 +00:00
Serge Bazanski	a16af2db91	ops/machines.nix: inject workspace This makes the hscloud readTree object available as following in NixOS modules: { config, pkgs, workspace, ... }: { environment.systemPackages = [ workspace.hswaw.laserproxy ]; } Change-Id: I9c8146f5156ffe5d06cb8408a2ce632657990d59 Reviewed-on: https://gerrit.hackerspace.pl/c/hscloud/+/1164 Reviewed-by: q3k <q3k@hackerspace.pl>	2021-10-16 21:24:22 +00:00
Serge Bazanski	9848e7e15f	cluster: deploy NixOS-based ceph First pass at a non-rook-managed Ceph cluster. We call it k0 instead of ceph-waw4, as we pretty much are sure now that we will always have a one-kube-cluster-to-one-ceph-cluster correspondence, with different Ceph pools for different media kinds (if at all). For now this has one mon and spinning rust OSDs. This can be iterated on to make it less terrible with time. See b/6 for more details. Change-Id: Ie502a232c700af93f33fcad9fa1c57058161aa11	2021-09-11 20:33:24 +00:00
Serge Bazanski	b3c6770f8d	ops, cluster: consolidate NixOS provisioning This moves the diff-and-activate logic from cluster/nix/provision.nix into ops/{provision,machines}.nix that can be used for both cluster machines and bgpwtf machines. The provisioning scripts now live per-NixOS-config, and anything under ops.machines.$fqdn now has a .passthru.hscloud.provision derivation which is that script. When ran, it will attempt to deploy onto the target machine. There's also a top-level tool at `ops.provision` which builds all configurations / machines and can be called with the machine name/fqdn to call the corresponding provisioner script. clustercfg is changed to use the new provisioning logic. Change-Id: I258abce9e8e3db42af35af102f32ab7963046353	2021-09-10 23:55:52 +00:00
Serge Bazanski	0ec06d7b75	ops: update deploy instructions to include profile set This is necessary for the NixOS EFI boot machinery to pick up the new derivation when switching to it, otherwise the machine will not boot into the newly switched configuration. Change-Id: I8b18956d2afeea09c38462f09a00c345cf86f80d	2021-04-18 18:13:33 +00:00
Serge Bazanski	a0332a75a0	ops/machines: pin edge01.waw to its current version of nixpkgs Stopgap until we finish b/3, need to deploy some changes on it without rebooting into newer nixpkgs. Change-Id: Ic2690dfcb398a419338961c8fcbc7e604298977a	2021-03-18 19:22:41 +00:00
Piotr Dobrowolski	7f8f3e9f9c	ops/sso: upgrade sso-v2 Change in sso-v2 unifies id_token and userinfo endpoint handling - now groups, nickname, email and preferred_username keys are present in id_tokens as well. https://code.hackerspace.pl/informatic/sso-v2/commit/?id=c4c810cd255a7bfcab5ced3fb88c8b311b518c34 Change-Id: Ib22994edc067fd83701590182f8096f6fca692ba	2021-02-01 17:03:27 +01:00
Serge Bazanski	9e3ca9c841	ops/sso: move jsonnets to kube/ This is in preparation for moving the sso source code into hscloud. Change-Id: I4325df617dc82c17fb4c96762743f0b70122976f	2021-01-31 15:52:06 +01:00
Serge Bazanski	cc2ff79f01	ops/monitoring: move grafana to sso. Change-Id: Ib2ecf6820454a160834db2ac212b31d9d5306972	2021-01-30 17:26:47 +01:00
q3k	d82807e024	Merge changes I84873bc3,I1eedb190 * changes: ops/monitoring: deploy grafana ops/monitoring: scrape apiserver, scheduler, and controller-manager	2021-01-30 16:22:44 +00:00
Piotr Dobrowolski	d6c97596cd	ops/sso: "the hackerspace oidc/oauth2 provider" deployment Change-Id: I092b844364ed30037eff00188dcdf5d6d3c228c5	2021-01-29 23:23:09 +01:00
Serge Bazanski	4f7caf8d86	ops/monitoring: deploy grafana This is a basic grafana running on: https://monitoring-global-dashboard.k0.hswaw.net/ It contains a data source pointing at the corresponding global victoria metrics. There's no dashboards, these will be provisioned soon via jsonnet/grafonnet. Change-Id: I84873bc323d1727096e3ce818fae122a9af3e191	2020-12-17 22:10:31 +00:00
Serge Bazanski	cfc0496266	ops/monitoring: scrape apiserver, scheduler, and controller-manager These get scraped by public IP address, which get retrieved via service discovery in Prometheus (by using the endpoints role on the default/kubernetes service). Also drive-by fix cluster prometheus resources - the default configuration wants at least 3GB of physical memory. Change-Id: I1eedb19051f62b40613f69e5f0f736d5958acf42	2020-12-17 22:09:56 +00:00
Serge Bazanski	7d311e9602	ops/monitoring: pull in grafonnet-7.0 Change-Id: Ie036ef767419418876a18255a5ad378f5cfa1535	2020-10-10 15:59:45 +00:00
Serge Bazanski	363bf4f341	monitoring: global: implement This creates a basic Global instance, running Victoria Metrics on k0. Change-Id: Ib03003213d79b41cc54efe40cd2c4837f652c0f4	2020-10-06 14:28:27 +00:00
Serge Bazanski	6abe4fa771	bgpwtf/machines: init edge01.waw This configures our WAW edge router using NixOS. This replaces our previous Ubuntu installation. Change-Id: Ibd72bde66ec413164401da407c5b268ad83fd3af	2020-10-03 14:57:38 +00:00
Sergiusz Bazanski	c1364e8d8a	ops/monitoring: add implr to owners This will fix future reviews from him having to require my +2. Change-Id: Icde1f64fe4387e92d19943d7469ce0569eb45257	2020-06-07 02:23:09 +02:00
Sergiusz Bazanski	2022ac2338	ops/monitoring: split up jsonnet, add simple docs Change-Id: I8120958a6862411de0446896875766834457aba9	2020-06-06 17:05:15 +02:00
Sergiusz Bazanski	ce81c39081	ops/metrics: basic cluster setup with prometheus We handwavingly plan on implementing monitoring as a two-tier system: - a 'global' component that is reponsible for global aggregation, long-term storage and alerting. - multiple 'per-cluster' components, that collect metrics from Kubernetes clusters and export them to the global component. In addition, several lower tiers (collected by per-cluster components) might also be implemented in the future - for instance, specific to some subprojects. Here we start sketching out some basic jsonnet structure (currently all in a single file, with little parametrization) and a cluster-level prometheus server that scrapes Kubernetes Node and cAdvisor metrics. This review is mostly to get this commited as early as possible, and to make sure that the little existing Prometheus scrape configuration is sane. Change-Id: If37ac3b1243b8b6f464d65fee6d53080c36f992c	2020-06-06 15:56:10 +02:00

35 commits