1
0
Fork 0

Merge "cluster/kube: split up cluster.jsonnet"

master
q3k 2020-06-13 17:52:27 +00:00 committed by Gerrit Code Review
commit 9b2ce179a8
7 changed files with 396 additions and 296 deletions

View File

@ -8,7 +8,9 @@ Current cluster: `k0.hswaw.net`
Persistent Storage (waw2)
-------------------------
HDDs on bc01n0{1-3}. 3TB total capacity. Don't use this as this pool should go away soon (the disks are slow, the network is slow and the RAID controllers lie). Use ceph-waw3 instead.
HDDs on bc01n0{1-3}. 3TB total capacity. Don't use this as this pool should go
away soon (the disks are slow, the network is slow and the RAID controllers
lie). Use ceph-waw3 instead.
The following storage classes use this cluster:
@ -17,9 +19,12 @@ The following storage classes use this cluster:
- `waw-hdd-yolo-1` - unreplicated (you _will_ lose your data)
- `waw-hdd-redundant-1-object` - erasure coded 2.1 object store
Rados Gateway (S3) is available at https://object.ceph-waw2.hswaw.net/. To create a user, ask an admin.
Rados Gateway (S3) is available at https://object.ceph-waw2.hswaw.net/. To
create a user, ask an admin.
PersistentVolumes currently bound to PersistentVolumeClaims get automatically backed up (hourly for the next 48 hours, then once every 4 weeks, then once every month for a year).
PersistentVolumes currently bound to PersistentVolumeClaims get automatically
backed up (hourly for the next 48 hours, then once every 4 weeks, then once
every month for a year).
Persistent Storage (waw3)
-------------------------
@ -32,9 +37,12 @@ The following storage classes use this cluster:
- `waw-hdd-redundant-3` - 2 replicas
- `waw-hdd-redundant-3-object` - 2 replicas, object store
Rados Gateway (S3) is available at https://object.ceph-waw3.hswaw.net/. To create a user, ask an admin.
Rados Gateway (S3) is available at https://object.ceph-waw3.hswaw.net/. To
create a user, ask an admin.
PersistentVolumes currently bound to PVCs get automatically backed up (hourly for the next 48 hours, then once every 4 weeks, then once every month for a year).
PersistentVolumes currently bound to PVCs get automatically backed up (hourly
for the next 48 hours, then once every 4 weeks, then once every month for a
year).
Administration
==============
@ -42,25 +50,55 @@ Administration
Provisioning nodes
------------------
- bring up a new node with nixos, the configuration doesn't matter and will be nuked anyway
- bring up a new node with nixos, the configuration doesn't matter and will be
nuked anyway
- edit cluster/nix/defs-machines.nix
- `bazel run //cluster/clustercfg nodestrap bc01nXX.hswaw.net`
Applying kubecfg state
----------------------
First, decrypt/sync all secrets:
secretstore sync cluster/secrets/
Then, run kubecfg. There's multiple top-level 'view' files that you can run,
all located in `//cluster/kube`. All of them use `k0.libsonnet` as the master
state of Kubernetes configuration, just expose subsets of it to work around the
fact that kubecfg gets somewhat slow with a lot of resources.
- `k0.jsonnet`: everything that is defined for k0 in `//cluster/kube/...`.
- `k0-core.jsonnet`: definitions that re in common across all clusters
(networking, registry, etc), without Rook.
- `k0-registry.jsonnet`: just the docker registry on k0 (useful when changing
ACLs).
- `k0-ceph.jsonnet`: everything ceph/rook related on k0.
When in doubt, run `k0.jsonnet`. There's no harm in doing it, it might just be
slow. Running individual files without realizing that whatever change you
implemented also influenced something that was rendered in another file can
cause to production inconsistencies.
Feel free to add more view files for typical administrative tasks.
Ceph - Debugging
-----------------
We run Ceph via Rook. The Rook operator is running in the `ceph-rook-system` namespace. To debug Ceph issues, start by looking at its logs.
We run Ceph via Rook. The Rook operator is running in the `ceph-rook-system`
namespace. To debug Ceph issues, start by looking at its logs.
A dashboard is available at https://ceph-waw2.hswaw.net/ and https://ceph-waw3.hswaw.net, to get the admin password run:
A dashboard is available at https://ceph-waw2.hswaw.net/ and
https://ceph-waw3.hswaw.net, to get the admin password run:
kubectl -n ceph-waw2 get secret rook-ceph-dashboard-password -o yaml | grep "password:" | awk '{print $2}' | base64 --decode ; echo
kubectl -n ceph-waw2 get secret rook-ceph-dashboard-password -o yaml | grep "password:" | awk '{print $2}' | base64 --decode ; echo
kubectl -n ceph-waw3 get secret rook-ceph-dashboard-password -o yaml | grep "password:" | awk '{print $2}' | base64 --decode ; echo
Ceph - Backups
--------------
Kubernetes PVs backed in Ceph RBDs get backed up using Benji. An hourly cronjob runs in every Ceph cluster. You can also manually trigger a run by doing:
Kubernetes PVs backed in Ceph RBDs get backed up using Benji. An hourly cronjob
runs in every Ceph cluster. You can also manually trigger a run by doing:
kubectl -n ceph-waw2 create job --from=cronjob/ceph-waw2-benji ceph-waw2-benji-manual-$(date +%s)
kubectl -n ceph-waw3 create job --from=cronjob/ceph-waw3-benji ceph-waw3-benji-manual-$(date +%s)
@ -70,10 +108,12 @@ Ceph ObjectStorage pools (RADOSGW) are _not_ backed up yet!
Ceph - Object Storage
---------------------
To create an object store user consult rook.io manual (https://rook.io/docs/rook/v0.9/ceph-object-store-user-crd.html)
User authentication secret is generated in ceph cluster namespace (`ceph-waw2`),
thus may need to be manually copied into application namespace. (see `app/registry/prod.jsonnet` comment)
`tools/rook-s3cmd-config` can be used to generate test configuration file for s3cmd.
Remember to append `:default-placement` to your region name (ie. `waw-hdd-redundant-1-object:default-placement`)
To create an object store user consult rook.io manual
(https://rook.io/docs/rook/v0.9/ceph-object-store-user-crd.html).
User authentication secret is generated in ceph cluster namespace
(`ceph-waw{2,3}`), thus may need to be manually copied into application namespace.
(see `app/registry/prod.jsonnet` comment)
`tools/rook-s3cmd-config` can be used to generate test configuration file for
s3cmd. Remember to append `:default-placement` to your region name (ie.
`waw-hdd-redundant-3-object:default-placement`)

View File

@ -0,0 +1,221 @@
# Common cluster configuration.
# This defines what Kubernetes resources are required to turn a bare k8s
# deployment into a fully working cluster.
# These assume that you're running on bare metal, and using the corresponding
# NixOS deployment that we do.
local kube = import "../../kube/kube.libsonnet";
local policies = import "../../kube/policies.libsonnet";
local calico = import "lib/calico.libsonnet";
local certmanager = import "lib/cert-manager.libsonnet";
local coredns = import "lib/coredns.libsonnet";
local metallb = import "lib/metallb.libsonnet";
local metrics = import "lib/metrics.libsonnet";
local nginx = import "lib/nginx.libsonnet";
local prodvider = import "lib/prodvider.libsonnet";
local rook = import "lib/rook.libsonnet";
local pki = import "lib/pki.libsonnet";
{
Cluster(short, realm):: {
local cluster = self,
local cfg = cluster.cfg,
short:: short,
realm:: realm,
fqdn:: "%s.%s" % [cluster.short, cluster.realm],
cfg:: {
// Storage class used for internal services (like registry). This must
// be set to a valid storage class. This can either be a cloud provider class
// (when running on GKE &co) or a storage class created using rook.
storageClassNameRedundant: error "storageClassNameRedundant must be set",
},
// These are required to let the API Server contact kubelets.
crAPIServerToKubelet: kube.ClusterRole("system:kube-apiserver-to-kubelet") {
metadata+: {
annotations+: {
"rbac.authorization.kubernetes.io/autoupdate": "true",
},
labels+: {
"kubernetes.io/bootstrapping": "rbac-defaults",
},
},
rules: [
{
apiGroups: [""],
resources: ["nodes/%s" % r for r in [ "proxy", "stats", "log", "spec", "metrics" ]],
verbs: ["*"],
},
],
},
crbAPIServer: kube.ClusterRoleBinding("system:kube-apiserver") {
roleRef: {
apiGroup: "rbac.authorization.k8s.io",
kind: "ClusterRole",
name: cluster.crAPIServerToKubelet.metadata.name,
},
subjects: [
{
apiGroup: "rbac.authorization.k8s.io",
kind: "User",
# A cluster API Server authenticates with a certificate whose CN is == to the FQDN of the cluster.
name: cluster.fqdn,
},
],
},
// This ClusterRole is bound to all humans that log in via prodaccess/prodvider/SSO.
// It should allow viewing of non-sensitive data for debugability and openness.
crViewer: kube.ClusterRole("system:viewer") {
rules: [
{
apiGroups: [""],
resources: [
"nodes",
"namespaces",
"pods",
"configmaps",
"services",
],
verbs: ["list"],
},
{
apiGroups: ["metrics.k8s.io"],
resources: [
"nodes",
"pods",
],
verbs: ["list"],
},
{
apiGroups: ["apps"],
resources: [
"statefulsets",
],
verbs: ["list"],
},
{
apiGroups: ["extensions"],
resources: [
"deployments",
"ingresses",
],
verbs: ["list"],
}
],
},
// This ClusterRole is applied (scoped to personal namespace) to all humans.
crFullInNamespace: kube.ClusterRole("system:admin-namespace") {
rules: [
{
apiGroups: ["", "extensions", "apps"],
resources: ["*"],
verbs: ["*"],
},
{
apiGroups: ["batch"],
resources: ["jobs", "cronjobs"],
verbs: ["*"],
},
],
},
// This ClusterRoleBindings allows root access to cluster admins.
crbAdmins: kube.ClusterRoleBinding("system:admins") {
roleRef: {
apiGroup: "rbac.authorization.k8s.io",
kind: "ClusterRole",
name: "cluster-admin",
},
subjects: [
{
apiGroup: "rbac.authorization.k8s.io",
kind: "User",
name: user + "@hackerspace.pl",
} for user in [
"q3k",
"implr",
"informatic",
]
],
},
podSecurityPolicies: policies.Cluster {},
allowInsecureNamespaces: [
policies.AllowNamespaceInsecure("kube-system"),
policies.AllowNamespaceInsecure("metallb-system"),
],
// Allow all service accounts (thus all controllers) to create secure pods.
crbAllowServiceAccountsSecure: kube.ClusterRoleBinding("policy:allow-all-secure") {
roleRef_: cluster.podSecurityPolicies.secureRole,
subjects: [
{
kind: "Group",
apiGroup: "rbac.authorization.k8s.io",
name: "system:serviceaccounts",
}
],
},
// Calico network fabric
calico: calico.Environment {},
// CoreDNS for this cluster.
dns: coredns.Environment {
cfg+: {
cluster_domains: [
"cluster.local",
cluster.fqdn,
],
},
},
// Metrics Server
metrics: metrics.Environment {},
// Metal Load Balancer
metallb: metallb.Environment {},
// Main nginx Ingress Controller
nginx: nginx.Environment {},
// Cert-manager (Let's Encrypt, CA, ...)
certmanager: certmanager.Environment {},
issuer: kube.ClusterIssuer("letsencrypt-prod") {
spec: {
acme: {
server: "https://acme-v02.api.letsencrypt.org/directory",
email: "bofh@hackerspace.pl",
privateKeySecretRef: {
name: "letsencrypt-prod"
},
http01: {},
},
},
},
// Rook Ceph storage operator.
rook: rook.Operator {
operator+: {
spec+: {
replicas: 1,
},
},
},
// TLS PKI machinery (compatibility with mirko)
pki: pki.Environment(cluster.short, cluster.realm),
// Prodvider
prodvider: prodvider.Environment {
cfg+: {
apiEndpoint: "kubernetes.default.svc.%s" % [cluster.fqdn],
},
},
},
}

View File

@ -0,0 +1,8 @@
// Ceph operator (rook), pools, users.
local k0 = (import "k0.libsonnet").k0;
{
rook: k0.cluster.rook,
ceph: k0.ceph,
}

View File

@ -0,0 +1,6 @@
// Only the 'core' cluster resources - ie., resource non specific to k0 in particular.
// Without Rook, to speed things up.
(import "k0.libsonnet").k0.cluster {
rook+:: {},
}

View File

@ -0,0 +1,3 @@
// Only the registry running in k0.
(import "k0.libsonnet").k0.registry

3
cluster/kube/k0.jsonnet Normal file
View File

@ -0,0 +1,3 @@
// Everything in the k0 cluster definition.
(import "k0.libsonnet").k0

View File

@ -1,268 +1,62 @@
# Top level cluster configuration.
// k0.hswaw.net kubernetes cluster
// This defines the cluster as a single object.
// Use the sibling k0*.jsonnet 'view' files to actually apply the configuration.
local kube = import "../../kube/kube.libsonnet";
local policies = import "../../kube/policies.libsonnet";
local calico = import "lib/calico.libsonnet";
local certmanager = import "lib/cert-manager.libsonnet";
local cluster = import "cluster.libsonnet";
local cockroachdb = import "lib/cockroachdb.libsonnet";
local coredns = import "lib/coredns.libsonnet";
local metallb = import "lib/metallb.libsonnet";
local metrics = import "lib/metrics.libsonnet";
local nginx = import "lib/nginx.libsonnet";
local prodvider = import "lib/prodvider.libsonnet";
local registry = import "lib/registry.libsonnet";
local rook = import "lib/rook.libsonnet";
local pki = import "lib/pki.libsonnet";
local Cluster(short, realm) = {
local cluster = self,
local cfg = cluster.cfg,
short:: short,
realm:: realm,
fqdn:: "%s.%s" % [cluster.short, cluster.realm],
cfg:: {
// Storage class used for internal services (like registry). This must
// be set to a valid storage class. This can either be a cloud provider class
// (when running on GKE &co) or a storage class created using rook.
storageClassNameRedundant: error "storageClassNameRedundant must be set",
},
// These are required to let the API Server contact kubelets.
crAPIServerToKubelet: kube.ClusterRole("system:kube-apiserver-to-kubelet") {
metadata+: {
annotations+: {
"rbac.authorization.kubernetes.io/autoupdate": "true",
},
labels+: {
"kubernetes.io/bootstrapping": "rbac-defaults",
},
},
rules: [
{
apiGroups: [""],
resources: ["nodes/%s" % r for r in [ "proxy", "stats", "log", "spec", "metrics" ]],
verbs: ["*"],
},
],
},
crbAPIServer: kube.ClusterRoleBinding("system:kube-apiserver") {
roleRef: {
apiGroup: "rbac.authorization.k8s.io",
kind: "ClusterRole",
name: cluster.crAPIServerToKubelet.metadata.name,
},
subjects: [
{
apiGroup: "rbac.authorization.k8s.io",
kind: "User",
# A cluster API Server authenticates with a certificate whose CN is == to the FQDN of the cluster.
name: cluster.fqdn,
},
],
},
// This ClusteRole is bound to all humans that log in via prodaccess/prodvider/SSO.
// It should allow viewing of non-sensitive data for debugability and openness.
crViewer: kube.ClusterRole("system:viewer") {
rules: [
{
apiGroups: [""],
resources: [
"nodes",
"namespaces",
"pods",
"configmaps",
"services",
],
verbs: ["list"],
},
{
apiGroups: ["metrics.k8s.io"],
resources: [
"nodes",
"pods",
],
verbs: ["list"],
},
{
apiGroups: ["apps"],
resources: [
"statefulsets",
],
verbs: ["list"],
},
{
apiGroups: ["extensions"],
resources: [
"deployments",
"ingresses",
],
verbs: ["list"],
}
],
},
// This ClusterRole is applied (scoped to personal namespace) to all humans.
crFullInNamespace: kube.ClusterRole("system:admin-namespace") {
rules: [
{
apiGroups: ["", "extensions", "apps"],
resources: ["*"],
verbs: ["*"],
},
{
apiGroups: ["batch"],
resources: ["jobs", "cronjobs"],
verbs: ["*"],
},
],
},
// This ClusterRoleBindings allows root access to cluster admins.
crbAdmins: kube.ClusterRoleBinding("system:admins") {
roleRef: {
apiGroup: "rbac.authorization.k8s.io",
kind: "ClusterRole",
name: "cluster-admin",
},
subjects: [
{
apiGroup: "rbac.authorization.k8s.io",
kind: "User",
name: user + "@hackerspace.pl",
} for user in [
"q3k",
"implr",
"informatic",
]
],
},
podSecurityPolicies: policies.Cluster {},
allowInsecureNamespaces: [
policies.AllowNamespaceInsecure("kube-system"),
policies.AllowNamespaceInsecure("metallb-system"),
# TODO(q3k): fix this?
policies.AllowNamespaceInsecure("ceph-waw2"),
policies.AllowNamespaceInsecure("ceph-waw3"),
policies.AllowNamespaceInsecure("matrix"),
policies.AllowNamespaceInsecure("registry"),
policies.AllowNamespaceInsecure("internet"),
# TODO(implr): restricted policy with CAP_NET_ADMIN and tuntap, but no full root
policies.AllowNamespaceInsecure("implr-vpn"),
],
// Allow all service accounts (thus all controllers) to create secure pods.
crbAllowServiceAccountsSecure: kube.ClusterRoleBinding("policy:allow-all-secure") {
roleRef_: cluster.podSecurityPolicies.secureRole,
subjects: [
{
kind: "Group",
apiGroup: "rbac.authorization.k8s.io",
name: "system:serviceaccounts",
}
],
},
// Calico network fabric
calico: calico.Environment {},
// CoreDNS for this cluster.
dns: coredns.Environment {
cfg+: {
cluster_domains: [
"cluster.local",
cluster.fqdn,
],
},
},
// Metrics Server
metrics: metrics.Environment {},
// Metal Load Balancer
metallb: metallb.Environment {
cfg+: {
peers: [
{
"peer-address": "185.236.240.33",
"peer-asn": 65001,
"my-asn": 65002,
},
],
addressPools: [
{
name: "public-v4-1",
protocol: "bgp",
addresses: [
"185.236.240.48/28",
],
},
{
name: "public-v4-2",
protocol: "bgp",
addresses: [
"185.236.240.112/28"
],
},
],
},
},
// Main nginx Ingress Controller
nginx: nginx.Environment {},
certmanager: certmanager.Environment {},
issuer: kube.ClusterIssuer("letsencrypt-prod") {
spec: {
acme: {
server: "https://acme-v02.api.letsencrypt.org/directory",
email: "bofh@hackerspace.pl",
privateKeySecretRef: {
name: "letsencrypt-prod"
},
http01: {},
},
},
},
// Rook Ceph storage
rook: rook.Operator {
operator+: {
spec+: {
// TODO(q3k): Bring up the operator again when stability gets fixed
// See: https://github.com/rook/rook/issues/3059#issuecomment-492378873
replicas: 1,
},
},
},
// Docker registry
registry: registry.Environment {
cfg+: {
domain: "registry.%s" % [cluster.fqdn],
storageClassName: cfg.storageClassNameParanoid,
objectStorageName: "waw-hdd-redundant-2-object",
},
},
// TLS PKI machinery
pki: pki.Environment(cluster.short, cluster.realm),
// Prodvider
prodvider: prodvider.Environment {
cfg+: {
apiEndpoint: "kubernetes.default.svc.%s" % [cluster.fqdn],
},
},
};
{
k0: {
local k0 = self,
cluster: Cluster("k0", "hswaw.net") {
cluster: cluster.Cluster("k0", "hswaw.net") {
cfg+: {
storageClassNameParanoid: k0.ceph.waw2Pools.blockParanoid.name,
},
metallb+: {
cfg+: {
peers: [
{
"peer-address": "185.236.240.33",
"peer-asn": 65001,
"my-asn": 65002,
},
],
addressPools: [
{
name: "public-v4-1",
protocol: "bgp",
addresses: [
"185.236.240.48/28",
],
},
{
name: "public-v4-2",
protocol: "bgp",
addresses: [
"185.236.240.112/28"
],
},
],
},
},
},
// Docker registry
registry: registry.Environment {
cfg+: {
domain: "registry.%s" % [k0.cluster.fqdn],
storageClassName: k0.cluster.cfg.storageClassNameParanoid,
objectStorageName: "waw-hdd-redundant-2-object",
},
},
// CockroachDB, running on bc01n{01,02,03}.
cockroach: {
waw2: cockroachdb.Cluster("crdb-waw1") {
cfg+: {
@ -271,6 +65,7 @@ local Cluster(short, realm) = {
{ name: "bc01n02", node: "bc01n02.hswaw.net" },
{ name: "bc01n03", node: "bc01n03.hswaw.net" },
],
// Host path on SSD.
hostPath: "/var/db/crdb-waw1",
},
},
@ -281,9 +76,10 @@ local Cluster(short, realm) = {
sso: k0.cockroach.waw2.Client("sso"),
},
},
ceph: {
// waw1 cluster - dead as of 2019/08/06, data corruption
// waw2 cluster
// waw2 cluster: shitty 7200RPM 2.5" HDDs
waw2: rook.Cluster(k0.cluster.rook, "ceph-waw2") {
spec: {
mon: {
@ -378,6 +174,8 @@ local Cluster(short, realm) = {
},
},
},
// waw3: 6TB SAS 3.5" HDDs
waw3: rook.Cluster(k0.cluster.rook, "ceph-waw3") {
spec: {
mon: {
@ -481,39 +279,60 @@ local Cluster(short, realm) = {
},
},
},
},
# Used for owncloud.hackerspace.pl, which for now lices on boston-packets.hackerspace.pl.
nextcloudWaw3: kube.CephObjectStoreUser("nextcloud") {
metadata+: {
namespace: "ceph-waw3",
},
spec: {
store: "waw-hdd-redundant-3-object",
displayName: "nextcloud",
// Clients for S3/radosgw storage.
clients: {
# Used for owncloud.hackerspace.pl, which for now lives on boston-packets.hackerspace.pl.
nextcloudWaw3: kube.CephObjectStoreUser("nextcloud") {
metadata+: {
namespace: "ceph-waw3",
},
spec: {
store: "waw-hdd-redundant-3-object",
displayName: "nextcloud",
},
},
# nuke@hackerspace.pl's personal storage.
nukePersonalWaw3: kube.CephObjectStoreUser("nuke-personal") {
metadata+: {
namespace: "ceph-waw3",
},
spec: {
store: "waw-hdd-redundant-3-object",
displayName: "nuke-personal",
},
},
# patryk@hackerspace.pl's ArmA3 mod bucket.
cz2ArmaModsWaw3: kube.CephObjectStoreUser("cz2-arma3mods") {
metadata+: {
namespace: "ceph-waw3",
},
spec: {
store: "waw-hdd-redundant-3-object",
displayName: "cz2-arma3mods",
},
},
},
},
# nuke@hackerspace.pl's personal storage.
nukePersonalWaw3: kube.CephObjectStoreUser("nuke-personal") {
metadata+: {
namespace: "ceph-waw3",
},
spec: {
store: "waw-hdd-redundant-3-object",
displayName: "nuke-personal",
},
},
# patryk@hackerspace.pl's ArmA3 mod bucket.
cz2ArmaModsWaw3: kube.CephObjectStoreUser("cz2-arma3mods") {
metadata+: {
namespace: "ceph-waw3",
},
spec: {
store: "waw-hdd-redundant-3-object",
displayName: "cz2-arma3mods",
},
},
# These are policies allowing for Insecure pods in some namespaces.
# A lot of them are spurious and come from the fact that we deployed
# these namespaces before we deployed the draconian PodSecurityPolicy
# we have now. This should be fixed by setting up some more granular
# policies, or fixing the workloads to not need some of the permission
# bits they use, whatever those might be.
# TODO(q3k): fix this?
unnecessarilyInsecureNamespaces: [
policies.AllowNamespaceInsecure("ceph-waw2"),
policies.AllowNamespaceInsecure("ceph-waw3"),
policies.AllowNamespaceInsecure("matrix"),
policies.AllowNamespaceInsecure("registry"),
policies.AllowNamespaceInsecure("internet"),
# TODO(implr): restricted policy with CAP_NET_ADMIN and tuntap, but no full root
policies.AllowNamespaceInsecure("implr-vpn"),
],
},
}