forked from hswaw/hscloud
Sergiusz Bazanski
5f9b1ecd67
rules_pip has a new version [1] of their rule system, incompatible with the version we used, that fixes a bunch of issues, notably: - explicit tagging of repositories for PY2/PY3/PY23 support - removal of dependency on host pip (in exchange for having to vendor wheels) - higher quality tooling for locking We update to the newer version of pip_rules, rename the external repository to pydeps and move requirements.txt, the lockfile and the newly vendored wheels to third_party/, where they belong. [1] - https://github.com/apt-itude/rules_pip/issues/16 Change-Id: I1065ee2fc410e52fca2be89fcbdd4cc5a4755d55
69 lines
2.4 KiB
Text
69 lines
2.4 KiB
Text
HSCloud Clusters
|
|
================
|
|
|
|
Current cluster: `k0.hswaw.net`
|
|
|
|
Accessing via kubectl
|
|
---------------------
|
|
|
|
prodaccess # get a short-lived certificate for your use via SSO
|
|
kubectl version
|
|
kubectl top nodes
|
|
|
|
Every user gets a `personal-$username` namespace. Feel free to use it for your own purposes, but watch out for resource usage!
|
|
|
|
Persistent Storage
|
|
------------------
|
|
|
|
HDDs on bc01n0{1-3}. 3TB total capacity.
|
|
|
|
The following storage classes use this cluster:
|
|
|
|
- `waw-hdd-paranoid-1` - 3 replicas
|
|
- `waw-hdd-redundant-1` - erasure coded 2.1
|
|
- `waw-hdd-yolo-1` - unreplicated (you _will_ lose your data)
|
|
- `waw-hdd-redundant-1-object` - erasure coded 2.1 object store
|
|
|
|
Rados Gateway (S3) is available at https://object.ceph-waw2.hswaw.net/. To create a user, ask an admin.
|
|
|
|
PersistentVolumes currently bound to PVCs get automatically backued up (hourly for the next 48 hours, then once every 4 weeks, then once every month for a year).
|
|
|
|
Administration
|
|
==============
|
|
|
|
Provisioning nodes
|
|
------------------
|
|
|
|
- bring up a new node with nixos, running the configuration.nix from bootstrap (to be documented)
|
|
- `bazel run //cluster/clustercfg nodestrap bc01nXX.hswaw.net`
|
|
|
|
Ceph - Debugging
|
|
-----------------
|
|
|
|
We run Ceph via Rook. The Rook operator is running in the `ceph-rook-system` namespace. To debug Ceph issues, start by looking at its logs.
|
|
|
|
A dashboard is available at https://ceph-waw2.hswaw.net/, to get the admin password run:
|
|
|
|
kubectl -n ceph-waw2 get secret rook-ceph-dashboard-password -o yaml | grep "password:" | awk '{print $2}' | base64 --decode ; echo
|
|
|
|
|
|
Ceph - Backups
|
|
--------------
|
|
|
|
Kubernetes PVs backed in Ceph RBDs get backed up using Benji. An hourly cronjob runs in every Ceph cluster. You can also manually trigger a run by doing:
|
|
|
|
kubectl -n ceph-waw2 create job --from=cronjob/ceph-waw2-benji ceph-waw2-benji-manual-$(date +%s)
|
|
|
|
Ceph ObjectStorage pools (RADOSGW) are _not_ backed up yet!
|
|
|
|
Ceph - Object Storage
|
|
---------------------
|
|
|
|
To create an object store user consult rook.io manual (https://rook.io/docs/rook/v0.9/ceph-object-store-user-crd.html)
|
|
User authentication secret is generated in ceph cluster namespace (`ceph-waw2`),
|
|
thus may need to be manually copied into application namespace. (see
|
|
`app/registry/prod.jsonnet` comment)
|
|
|
|
`tools/rook-s3cmd-config` can be used to generate test configuration file for s3cmd.
|
|
Remember to append `:default-placement` to your region name (ie. `waw-hdd-redundant-1-object:default-placement`)
|
|
|