hscloud

cheshire

hscloud

Author	SHA1	Message	Date
q3k	9f0e1e88f1	cluster/clustercfg: rewrite it in Go This replaces the old clustercfg script with a brand spanking new mostly-equivalent Go reimplementation. But it's not exactly the same, here are the differences: 1. No cluster deployment logic anymore - we expect everyone to use ops/ machine at this point. 2. All certs/keys are Ed25519 and do not expire by default - but support for short-lived certificates is there, and is actually more generic and reusable. Currently it's only used for admincreds. 3. Speaking of admincreds: the new admincreds automatically figure out your username. 4. admincreds also doesn't shell out to kubectl anymore, and doesn't override your default context. The generated creds can live peacefully alongside your normal prodaccess creds. 5. gencerts (the new nodestrap without deployment support) now automatically generates certs for all nodes, based on local Nix modules in ops/. 6. No secretstore support. This will be changed once we rebuild secretstore in Go. For now users are expected to manually run secretstore sync on cluster/secrets. Change-Id: Ida935f44e04fd933df125905eee10121ac078495 Reviewed-on: https://gerrit.hackerspace.pl/c/hscloud/+/1498 Reviewed-by: q3k <q3k@hackerspace.pl>	2023-06-19 22:23:52 +00:00
informatic	7e841065b0	*: post-certmanager manifests update Change-Id: I745c850268c31777c5722a9833c8152a55615aed Reviewed-on: https://gerrit.hackerspace.pl/c/hscloud/+/1512 Reviewed-by: q3k <q3k@hackerspace.pl>	2023-06-19 21:20:44 +00:00
q3k	cc2ff79f01	ops/monitoring: move grafana to sso. Change-Id: Ib2ecf6820454a160834db2ac212b31d9d5306972	2021-01-30 17:26:47 +01:00
q3k	4f7caf8d86	ops/monitoring: deploy grafana This is a basic grafana running on: https://monitoring-global-dashboard.k0.hswaw.net/ It contains a data source pointing at the corresponding global victoria metrics. There's no dashboards, these will be provisioned soon via jsonnet/grafonnet. Change-Id: I84873bc323d1727096e3ce818fae122a9af3e191	2020-12-17 22:10:31 +00:00
q3k	cfc0496266	ops/monitoring: scrape apiserver, scheduler, and controller-manager These get scraped by public IP address, which get retrieved via service discovery in Prometheus (by using the endpoints role on the default/kubernetes service). Also drive-by fix cluster prometheus resources - the default configuration wants at least 3GB of physical memory. Change-Id: I1eedb19051f62b40613f69e5f0f736d5958acf42	2020-12-17 22:09:56 +00:00
q3k	7d311e9602	ops/monitoring: pull in grafonnet-7.0 Change-Id: Ie036ef767419418876a18255a5ad378f5cfa1535	2020-10-10 15:59:45 +00:00
q3k	363bf4f341	monitoring: global: implement This creates a basic Global instance, running Victoria Metrics on k0. Change-Id: Ib03003213d79b41cc54efe40cd2c4837f652c0f4	2020-10-06 14:28:27 +00:00
q3k	c1364e8d8a	ops/monitoring: add implr to owners This will fix future reviews from him having to require my +2. Change-Id: Icde1f64fe4387e92d19943d7469ce0569eb45257	2020-06-07 02:23:09 +02:00
q3k	2022ac2338	ops/monitoring: split up jsonnet, add simple docs Change-Id: I8120958a6862411de0446896875766834457aba9	2020-06-06 17:05:15 +02:00
q3k	ce81c39081	ops/metrics: basic cluster setup with prometheus We handwavingly plan on implementing monitoring as a two-tier system: - a 'global' component that is reponsible for global aggregation, long-term storage and alerting. - multiple 'per-cluster' components, that collect metrics from Kubernetes clusters and export them to the global component. In addition, several lower tiers (collected by per-cluster components) might also be implemented in the future - for instance, specific to some subprojects. Here we start sketching out some basic jsonnet structure (currently all in a single file, with little parametrization) and a cluster-level prometheus server that scrapes Kubernetes Node and cAdvisor metrics. This review is mostly to get this commited as early as possible, and to make sure that the little existing Prometheus scrape configuration is sane. Change-Id: If37ac3b1243b8b6f464d65fee6d53080c36f992c	2020-06-06 15:56:10 +02:00

10 Commits (63ce423ebbd7afd4575fb02677e5e81c681b037d)