1
0
Fork 0
Commit Graph

15 Commits (cb9cbb3fccecf5768e0d6977deb8caffc7ba9456)

Author SHA1 Message Date
q3k 0ec06d7b75 ops: update deploy instructions to include profile set
This is necessary for the NixOS EFI boot machinery to pick up the new
derivation when switching to it, otherwise the machine will not boot
into the newly switched configuration.

Change-Id: I8b18956d2afeea09c38462f09a00c345cf86f80d
2021-04-18 18:13:33 +00:00
q3k a0332a75a0 ops/machines: pin edge01.waw to its current version of nixpkgs
Stopgap until we finish b/3, need to deploy some changes on it without
rebooting into newer nixpkgs.

Change-Id: Ic2690dfcb398a419338961c8fcbc7e604298977a
2021-03-18 19:22:41 +00:00
informatic 7f8f3e9f9c ops/sso: upgrade sso-v2
Change in sso-v2 unifies id_token and userinfo endpoint handling - now
groups, nickname, email and preferred_username keys are present in
id_tokens as well.

https://code.hackerspace.pl/informatic/sso-v2/commit/?id=c4c810cd255a7bfcab5ced3fb88c8b311b518c34

Change-Id: Ib22994edc067fd83701590182f8096f6fca692ba
2021-02-01 17:03:27 +01:00
q3k 9e3ca9c841 ops/sso: move jsonnets to kube/
This is in preparation for moving the sso source code into hscloud.

Change-Id: I4325df617dc82c17fb4c96762743f0b70122976f
2021-01-31 15:52:06 +01:00
q3k cc2ff79f01 ops/monitoring: move grafana to sso.
Change-Id: Ib2ecf6820454a160834db2ac212b31d9d5306972
2021-01-30 17:26:47 +01:00
q3k d82807e024 Merge changes I84873bc3,I1eedb190
* changes:
  ops/monitoring: deploy grafana
  ops/monitoring: scrape apiserver, scheduler, and controller-manager
2021-01-30 16:22:44 +00:00
informatic d6c97596cd ops/sso: "the hackerspace oidc/oauth2 provider" deployment
Change-Id: I092b844364ed30037eff00188dcdf5d6d3c228c5
2021-01-29 23:23:09 +01:00
q3k 4f7caf8d86 ops/monitoring: deploy grafana
This is a basic grafana running on:

    https://monitoring-global-dashboard.k0.hswaw.net/

It contains a data source pointing at the corresponding global victoria
metrics. There's no dashboards, these will be provisioned soon via
jsonnet/grafonnet.

Change-Id: I84873bc323d1727096e3ce818fae122a9af3e191
2020-12-17 22:10:31 +00:00
q3k cfc0496266 ops/monitoring: scrape apiserver, scheduler, and controller-manager
These get scraped by public IP address, which get retrieved via service
discovery in Prometheus (by using the endpoints role on the
default/kubernetes service).

Also drive-by fix cluster prometheus resources - the default
configuration wants at least 3GB of physical memory.

Change-Id: I1eedb19051f62b40613f69e5f0f736d5958acf42
2020-12-17 22:09:56 +00:00
q3k 7d311e9602 ops/monitoring: pull in grafonnet-7.0
Change-Id: Ie036ef767419418876a18255a5ad378f5cfa1535
2020-10-10 15:59:45 +00:00
q3k 363bf4f341 monitoring: global: implement
This creates a basic Global instance, running Victoria Metrics on k0.

Change-Id: Ib03003213d79b41cc54efe40cd2c4837f652c0f4
2020-10-06 14:28:27 +00:00
q3k 6abe4fa771 bgpwtf/machines: init edge01.waw
This configures our WAW edge router using NixOS. This replaces our
previous Ubuntu installation.

Change-Id: Ibd72bde66ec413164401da407c5b268ad83fd3af
2020-10-03 14:57:38 +00:00
q3k c1364e8d8a ops/monitoring: add implr to owners
This will fix future reviews from him having to require my +2.

Change-Id: Icde1f64fe4387e92d19943d7469ce0569eb45257
2020-06-07 02:23:09 +02:00
q3k 2022ac2338 ops/monitoring: split up jsonnet, add simple docs
Change-Id: I8120958a6862411de0446896875766834457aba9
2020-06-06 17:05:15 +02:00
q3k ce81c39081 ops/metrics: basic cluster setup with prometheus
We handwavingly plan on implementing monitoring as a two-tier system:

 - a 'global' component that is reponsible for global aggregation,
   long-term storage and alerting.
 - multiple 'per-cluster' components, that collect metrics from
   Kubernetes clusters and export them to the global component.

In addition, several lower tiers (collected by per-cluster components)
might also be implemented in the future - for instance, specific to some
subprojects.

Here we start sketching out some basic jsonnet structure (currently all
in a single file, with little parametrization) and a cluster-level
prometheus server that scrapes Kubernetes Node and cAdvisor metrics.

This review is mostly to get this commited as early as possible, and to
make sure that the little existing Prometheus scrape configuration is
sane.

Change-Id: If37ac3b1243b8b6f464d65fee6d53080c36f992c
2020-06-06 15:56:10 +02:00