mirror of https://gerrit.hackerspace.pl/hscloud
50 lines
1.5 KiB
Plaintext
50 lines
1.5 KiB
Plaintext
HSCloud Clusters
|
|
================
|
|
|
|
Current cluster: `k0.hswaw.net`
|
|
|
|
Accessing via kubectl
|
|
---------------------
|
|
|
|
There isn't yet a service for getting short-term user certificates. Instead, you'll have to get admin certificates:
|
|
|
|
clustercfg admincreds $(whoami)-admin
|
|
kubectl get nodes
|
|
|
|
Provisioning nodes
|
|
------------------
|
|
|
|
- bring up a new node with nixos, running the configuration.nix from bootstrap (to be documented)
|
|
- `clustercfg nodestrap bc01nXX.hswaw.net`
|
|
|
|
That's it!
|
|
|
|
Ceph
|
|
====
|
|
|
|
We run Ceph via Rook. The Rook operator is running in the `ceph-rook-system` namespace. To debug Ceph issues, start by looking at its logs.
|
|
|
|
The following Ceph clusters are available:
|
|
|
|
ceph-waw1
|
|
---------
|
|
|
|
HDDs on bc01n0{1-3}. 3TB total capacity.
|
|
|
|
The following storage classes use this cluster:
|
|
|
|
- `waw-hdd-redundant-1` - erasure coded 2.1
|
|
|
|
A dashboard is available at https://ceph-waw1.hswaw.net/, to get the admin password run:
|
|
|
|
kubectl -n ceph-waw1 get secret rook-ceph-dashboard-password -o yaml | grep "password:" | awk '{print $2}' | base64 --decode ; echo
|
|
|
|
Known Issues
|
|
============
|
|
|
|
After running `nixos-configure switch` on the hosts, the shared host/container CNI plugin directory gets nuked, and pods will fail to schedule on that node (TODO(q3k): error message here). To fix this, restart calico-node pods running on nodes that have this issue. The Calico Node pod will reschedule automatically and fix the CNI plugins directory.
|
|
|
|
kubectl -n kube-system get pods -o wide | grep calico-node
|
|
kubectl -n kube-system delete pod calico-node-XXXX
|
|
|