mirror of https://gerrit.hackerspace.pl/hscloud synced 2024-10-18 03:07:44 +00:00

History

viq 30a563c49f ops/monitoring/lib/cluster.libsonnet: scrape based on annotations This adds automatic scraping of pods and services based on presence of annotations: - prometheus.io/scrape - prometheus.io/port - prometheus.io/path Change-Id: I1c1afecc75c30278889de1f6ca0b17da69997295 Reviewed-on: https://gerrit.hackerspace.pl/c/hscloud/+/1850 Reviewed-by: implr <implr@hackerspace.pl>		2024-01-19 22:02:40 +00:00
..
doc	ops/monitoring: split up jsonnet, add simple docs	2020-06-06 17:05:15 +02:00
lib	ops/monitoring/lib/cluster.libsonnet: scrape based on annotations	2024-01-19 22:02:40 +00:00
secrets	ops/monitoring: deploy grafana	2020-12-17 22:10:31 +00:00
k0.jsonnet	ops/monitoring: deploy grafana	2020-12-17 22:10:31 +00:00
OWNERS	ops/monitoring: add implr to owners	2020-06-07 02:23:09 +02:00
README.md	monitoring: global: implement	2020-10-06 14:28:27 +00:00

README.md

hscloud monitoring

Quick links

Old Global Dashboard: monitoring.hackerspace.pl - old monitoring system, unrelated to this one, configured using Chef at management.hackerspace.pl (long since dead). This setup is supposed to replace it.

Architecture

The hscloud monitoring solution is two-tiered:

at the global tier we run metrics aggregation, long-term storage, dashboard and alerting.
at the agent tier we collect metrics from various sources (possibly even lower tiered agents).

All agent-tier agents send metrics to all global instances.

      .--------.     .--------.              '.
      | global |     | global |               > - global tier
      '--------'     '--------'              .'   (contains 'global instances')
        |    '---. .---'    |
        |         X         |
        |    .---' '---.    |
        |    |         |    |
.--------------.     .--------------------. '.
|   cluster    |     |    hswaw-proxy     |  |
| k0.hswaw.net |     | waw.hackerspace.pl |   > - agent tier
'--------------'     '--------------------' .'    (contains 'agents')

Agent - cluster

Cluster agents are responsible from collecting Kubernetes cluster metrics. They run a prometheus server that scrapes kubelet/cadvisor/... metrics and send them off to global instances.

Global Instances

Global agents run Victoria Metrics, ingest metrics from all agents, and perform long-term storage. In the future they will also run Grafana and AlertManager.