1
0
Fork 0
Commit Graph

10 Commits (63ce423ebbd7afd4575fb02677e5e81c681b037d)

Author SHA1 Message Date
q3k 9f0e1e88f1 cluster/clustercfg: rewrite it in Go
This replaces the old clustercfg script with a brand spanking new
mostly-equivalent Go reimplementation. But it's not exactly the same,
here are the differences:

 1. No cluster deployment logic anymore - we expect everyone to use ops/
    machine at this point.
 2. All certs/keys are Ed25519 and do not expire by default - but
    support for short-lived certificates is there, and is actually more
    generic and reusable. Currently it's only used for admincreds.
 3. Speaking of admincreds: the new admincreds automatically figure out
    your username.
 4. admincreds also doesn't shell out to kubectl anymore, and doesn't
    override your default context. The generated creds can live
    peacefully alongside your normal prodaccess creds.
 5. gencerts (the new nodestrap without deployment support) now
    automatically generates certs for all nodes, based on local Nix
    modules in ops/.
 6. No secretstore support. This will be changed once we rebuild
    secretstore in Go. For now users are expected to manually run
    secretstore sync on cluster/secrets.

Change-Id: Ida935f44e04fd933df125905eee10121ac078495
Reviewed-on: https://gerrit.hackerspace.pl/c/hscloud/+/1498
Reviewed-by: q3k <q3k@hackerspace.pl>
2023-06-19 22:23:52 +00:00
informatic 7e841065b0 *: post-certmanager manifests update
Change-Id: I745c850268c31777c5722a9833c8152a55615aed
Reviewed-on: https://gerrit.hackerspace.pl/c/hscloud/+/1512
Reviewed-by: q3k <q3k@hackerspace.pl>
2023-06-19 21:20:44 +00:00
q3k cc2ff79f01 ops/monitoring: move grafana to sso.
Change-Id: Ib2ecf6820454a160834db2ac212b31d9d5306972
2021-01-30 17:26:47 +01:00
q3k 4f7caf8d86 ops/monitoring: deploy grafana
This is a basic grafana running on:

    https://monitoring-global-dashboard.k0.hswaw.net/

It contains a data source pointing at the corresponding global victoria
metrics. There's no dashboards, these will be provisioned soon via
jsonnet/grafonnet.

Change-Id: I84873bc323d1727096e3ce818fae122a9af3e191
2020-12-17 22:10:31 +00:00
q3k cfc0496266 ops/monitoring: scrape apiserver, scheduler, and controller-manager
These get scraped by public IP address, which get retrieved via service
discovery in Prometheus (by using the endpoints role on the
default/kubernetes service).

Also drive-by fix cluster prometheus resources - the default
configuration wants at least 3GB of physical memory.

Change-Id: I1eedb19051f62b40613f69e5f0f736d5958acf42
2020-12-17 22:09:56 +00:00
q3k 7d311e9602 ops/monitoring: pull in grafonnet-7.0
Change-Id: Ie036ef767419418876a18255a5ad378f5cfa1535
2020-10-10 15:59:45 +00:00
q3k 363bf4f341 monitoring: global: implement
This creates a basic Global instance, running Victoria Metrics on k0.

Change-Id: Ib03003213d79b41cc54efe40cd2c4837f652c0f4
2020-10-06 14:28:27 +00:00
q3k c1364e8d8a ops/monitoring: add implr to owners
This will fix future reviews from him having to require my +2.

Change-Id: Icde1f64fe4387e92d19943d7469ce0569eb45257
2020-06-07 02:23:09 +02:00
q3k 2022ac2338 ops/monitoring: split up jsonnet, add simple docs
Change-Id: I8120958a6862411de0446896875766834457aba9
2020-06-06 17:05:15 +02:00
q3k ce81c39081 ops/metrics: basic cluster setup with prometheus
We handwavingly plan on implementing monitoring as a two-tier system:

 - a 'global' component that is reponsible for global aggregation,
   long-term storage and alerting.
 - multiple 'per-cluster' components, that collect metrics from
   Kubernetes clusters and export them to the global component.

In addition, several lower tiers (collected by per-cluster components)
might also be implemented in the future - for instance, specific to some
subprojects.

Here we start sketching out some basic jsonnet structure (currently all
in a single file, with little parametrization) and a cluster-level
prometheus server that scrapes Kubernetes Node and cAdvisor metrics.

This review is mostly to get this commited as early as possible, and to
make sure that the little existing Prometheus scrape configuration is
sane.

Change-Id: If37ac3b1243b8b6f464d65fee6d53080c36f992c
2020-06-06 15:56:10 +02:00