hscloud/cluster
Sergiusz Bazanski 42553cd044 cluster: disable unauthenticated read only port on kubelets
This port was leaking kubelet state, including information on running
pods. No secrets were leaked (if they were not text-pasted into
env/args), but this still shouldn't be available.

As far as I can tell, nothing depends on this port, other than some
enterprise load balancers that require HTTP for node 'health' checks.

Change-Id: I9549b73e0168fe3ea4dce43cbe8fdc2ca4575961
2019-09-02 16:33:02 +02:00
..
certs prod{access,vider}: implement 2019-08-30 23:08:18 +02:00
clustercfg prod{access,vider}: implement 2019-08-30 23:08:18 +02:00
kube prodvider: clean up LDAP connections 2019-08-31 15:00:51 +02:00
nix cluster: disable unauthenticated read only port on kubelets 2019-09-02 16:33:02 +02:00
prodaccess prod{access,vider}: implement 2019-08-30 23:08:18 +02:00
prodvider prodvider: clean up LDAP connections 2019-08-31 15:00:51 +02:00
secrets prod{access,vider}: implement 2019-08-30 23:08:18 +02:00
tools cluster/tools/install.sh: fix nixops graceful degradation 2019-07-23 01:37:11 +02:00
README prod{access,vider}: implement 2019-08-30 23:08:18 +02:00

HSCloud Clusters
================

Current cluster: `k0.hswaw.net`

Accessing via kubectl
---------------------

    prodaccess # get a short-lived certificate for your use via SSO
    kubectl get nodes

Persistent Storage
------------------

HDDs on bc01n0{1-3}. 3TB total capacity.

The following storage classes use this cluster:

 - `waw-hdd-paranoid-1` - 3 replicas
 - `waw-hdd-redundant-1` - erasure coded 2.1
 - `waw-hdd-yolo-1` - unreplicated (you _will_ lose your data)
 - `waw-hdd-redundant-1-object` - erasure coded 2.1 object store

A dashboard is available at https://ceph-waw1.hswaw.net/, to get the admin password run:

    kubectl -n ceph-waw1 get secret rook-ceph-dashboard-password -o yaml | grep "password:" | awk '{print $2}' | base64 --decode ; echo

Rados Gateway (S3) is available at https://object.ceph-waw1.hswaw.net/. To create
an object store user consult rook.io manual (https://rook.io/docs/rook/v0.9/ceph-object-store-user-crd.html)
User authentication secret is generated in ceph cluster namespace (`ceph-waw1`),
thus may need to be manually copied into application namespace. (see
`app/registry/prod.jsonnet` comment)

`tools/rook-s3cmd-config` can be used to generate test configuration file for s3cmd.
Remember to append `:default-placement` to your region name (ie. `waw-hdd-redundant-1-object:default-placement`)

Administration
==============

Provisioning nodes
------------------

 - bring up a new node with nixos, running the configuration.nix from bootstrap (to be documented)
 - `bazel run //cluster/clustercfg:clustercfg nodestrap bc01nXX.hswaw.net`

That's it!

Ceph
====

We run Ceph via Rook. The Rook operator is running in the `ceph-rook-system` namespace. To debug Ceph issues, start by looking at its logs.

The following Ceph clusters are available: