forked from hswaw/hscloud
Serge Bazanski
9f0e1e88f1
This replaces the old clustercfg script with a brand spanking new mostly-equivalent Go reimplementation. But it's not exactly the same, here are the differences: 1. No cluster deployment logic anymore - we expect everyone to use ops/ machine at this point. 2. All certs/keys are Ed25519 and do not expire by default - but support for short-lived certificates is there, and is actually more generic and reusable. Currently it's only used for admincreds. 3. Speaking of admincreds: the new admincreds automatically figure out your username. 4. admincreds also doesn't shell out to kubectl anymore, and doesn't override your default context. The generated creds can live peacefully alongside your normal prodaccess creds. 5. gencerts (the new nodestrap without deployment support) now automatically generates certs for all nodes, based on local Nix modules in ops/. 6. No secretstore support. This will be changed once we rebuild secretstore in Go. For now users are expected to manually run secretstore sync on cluster/secrets. Change-Id: Ida935f44e04fd933df125905eee10121ac078495 Reviewed-on: https://gerrit.hackerspace.pl/c/hscloud/+/1498 Reviewed-by: q3k <q3k@hackerspace.pl> |
||
---|---|---|
.. | ||
doc | ||
lib | ||
secrets | ||
k0.jsonnet | ||
OWNERS | ||
README.md |
hscloud monitoring
Quick links
- Old Global Dashboard: monitoring.hackerspace.pl - old monitoring system, unrelated to this one, configured using Chef at management.hackerspace.pl (long since dead). This setup is supposed to replace it.
Architecture
The hscloud monitoring solution is two-tiered:
- at the global tier we run metrics aggregation, long-term storage, dashboard and alerting.
- at the agent tier we collect metrics from various sources (possibly even lower tiered agents).
All agent-tier agents send metrics to all global instances.
.--------. .--------. '.
| global | | global | > - global tier
'--------' '--------' .' (contains 'global instances')
| '---. .---' |
| X |
| .---' '---. |
| | | |
.--------------. .--------------------. '.
| cluster | | hswaw-proxy | |
| k0.hswaw.net | | waw.hackerspace.pl | > - agent tier
'--------------' '--------------------' .' (contains 'agents')
Agent - cluster
Cluster agents are responsible from collecting Kubernetes cluster metrics. They run a prometheus server that scrapes kubelet/cadvisor/... metrics and send them off to global instances.
Global Instances
Global agents run Victoria Metrics, ingest metrics from all agents, and perform long-term storage. In the future they will also run Grafana and AlertManager.