This makes the hscloud readTree object available as following in NixOS
modules:
{ config, pkgs, workspace, ... }: {
environment.systemPackages = [
workspace.hswaw.laserproxy
];
}
Change-Id: I9c8146f5156ffe5d06cb8408a2ce632657990d59
Reviewed-on: https://gerrit.hackerspace.pl/c/hscloud/+/1164
Reviewed-by: q3k <q3k@hackerspace.pl>
First pass at a non-rook-managed Ceph cluster. We call it k0 instead of
ceph-waw4, as we pretty much are sure now that we will always have a
one-kube-cluster-to-one-ceph-cluster correspondence, with different Ceph
pools for different media kinds (if at all).
For now this has one mon and spinning rust OSDs. This can be iterated on
to make it less terrible with time.
See b/6 for more details.
Change-Id: Ie502a232c700af93f33fcad9fa1c57058161aa11
This moves the diff-and-activate logic from cluster/nix/provision.nix
into ops/{provision,machines}.nix that can be used for both cluster
machines and bgpwtf machines.
The provisioning scripts now live per-NixOS-config, and anything under
ops.machines.$fqdn now has a .passthru.hscloud.provision derivation
which is that script. When ran, it will attempt to deploy onto the
target machine.
There's also a top-level tool at `ops.provision` which builds all
configurations / machines and can be called with the machine name/fqdn
to call the corresponding provisioner script.
clustercfg is changed to use the new provisioning logic.
Change-Id: I258abce9e8e3db42af35af102f32ab7963046353
This is necessary for the NixOS EFI boot machinery to pick up the new
derivation when switching to it, otherwise the machine will not boot
into the newly switched configuration.
Change-Id: I8b18956d2afeea09c38462f09a00c345cf86f80d
Stopgap until we finish b/3, need to deploy some changes on it without
rebooting into newer nixpkgs.
Change-Id: Ic2690dfcb398a419338961c8fcbc7e604298977a
This is a basic grafana running on:
https://monitoring-global-dashboard.k0.hswaw.net/
It contains a data source pointing at the corresponding global victoria
metrics. There's no dashboards, these will be provisioned soon via
jsonnet/grafonnet.
Change-Id: I84873bc323d1727096e3ce818fae122a9af3e191
These get scraped by public IP address, which get retrieved via service
discovery in Prometheus (by using the endpoints role on the
default/kubernetes service).
Also drive-by fix cluster prometheus resources - the default
configuration wants at least 3GB of physical memory.
Change-Id: I1eedb19051f62b40613f69e5f0f736d5958acf42
We handwavingly plan on implementing monitoring as a two-tier system:
- a 'global' component that is reponsible for global aggregation,
long-term storage and alerting.
- multiple 'per-cluster' components, that collect metrics from
Kubernetes clusters and export them to the global component.
In addition, several lower tiers (collected by per-cluster components)
might also be implemented in the future - for instance, specific to some
subprojects.
Here we start sketching out some basic jsonnet structure (currently all
in a single file, with little parametrization) and a cluster-level
prometheus server that scrapes Kubernetes Node and cAdvisor metrics.
This review is mostly to get this commited as early as possible, and to
make sure that the little existing Prometheus scrape configuration is
sane.
Change-Id: If37ac3b1243b8b6f464d65fee6d53080c36f992c