hscloud monitoring ================== Quick links ----------- - *Old Global Dashboard*: [monitoring.hackerspace.pl](https://monitoring.hackerspace.pl) - old monitoring system, unrelated to this one, configured using Chef at management.hackerspace.pl (long since dead). This setup is supposed to replace it. Architecture ------------ The hscloud monitoring solution is two-tiered: - at the *global* tier we run metrics aggregation, long-term storage, dashboard and alerting. - at the *agent* tier we collect metrics from various sources (possibly even lower tiered agents). All agent-tier agents send metrics to all global instances. .--------. .--------. '. | global | | global | > - global tier '--------' '--------' .' (contains 'global instances') | '---. .---' | | X | | .---' '---. | | | | | .--------------. .--------------------. '. | cluster | | hswaw-proxy | | | k0.hswaw.net | | waw.hackerspace.pl | > - agent tier '--------------' '--------------------' .' (contains 'agents') Agent - cluster --------------- Cluster agents are responsible from collecting Kubernetes cluster metrics. They run a prometheus server that scrapes kubelet/cadvisor/... metrics and send them off to global instances. Global Instances ---------------- Global agents run Victoria Metrics, ingest metrics from all agents, and perform long-term storage. In the future they will also run Grafana and AlertManager.