1
0
Fork 0
Commit Graph

215 Commits (aef13358c8456a16ce65eb97e876ba1b56b8ced8)

Author SHA1 Message Date
q3k 1fad2e5c6e bgpwtf/cccampix: draw the rest of the fucking owl
Change-Id: I49fd5906e69512e8f2d414f406edc0179522f225
2019-08-11 23:43:25 +02:00
q3k d533892efa Fix crdb-waw1
We accidentally created crdb-waw2 in
https://gerrit.hackerspace.pl/c/hscloud/+/2.

We remove it now and also backport a manual change that makes the
crdb-waw1 service public via a LoadBalancer.

Change-Id: I3bbd6f01b82c6efa458cc44776f086ba36e9f20c
2019-08-11 23:42:47 +02:00
q3k d07861b7df ceph-waw1 -> ceph-waw2
Change-Id: I03d6244b9697a9efc06492114ef90cdb01e17601
2019-08-08 17:49:31 +02:00
q3k f774f2f31d Merge "app/registry: integrate into cluster/kube" 2019-08-02 00:28:10 +00:00
q3k 654c70dad7 cluster/tools/install.sh: fix nixops graceful degradation
Nixops requires nix_rules, which in turn requires a working nix
installation.

When we split tools/install.sh into tools/install.sh and
cluster/tools/install.sh [1], we accidentally made the latter always install
all cluster tools, including nixops - even if the install.sh script
detected that the system does not have Nix installed.

[1] - https://gerrit.hackerspace.pl/c/hscloud/+/81

Change-Id: Ib5357cfe125f1393b395b28062787f3f0091f549
2019-07-23 01:37:11 +02:00
q3k 4d61d20aec app/registry: integrate into cluster/kube
This makes a registry be automatically part of the cluster
infrastructure.

Tested by running kubecfg diff, no diffs (apart from out-of-date ACLs)
found.

Change-Id: Ic0635e789cf3fb851f410bcf2865326f1fa87545
2019-07-21 16:56:41 +02:00
q3k 1663e0e93b tools: move cluster-specific stuff to cluster/tools
Change-Id: I1813bb221d1bff0d6067eceb84d23510face60ff
2019-07-21 14:26:51 +00:00
q3k 116da981c9 nix/ -> cluster/nix/
These are related to cluster bootstrapping, not generic language
libraries (like go/ and bzl/).

Change-Id: I03a83c64f3e0fa6cb615d36b4e618f5e92d886ec
2019-07-21 15:53:20 +02:00
Serge Bazanski 2ce367681a *: move away from python_rules
python_rules is completely broken when it comes to py2/py3 support.

Here, we replace it with native python rules from new Bazel versions [1] and rules_pip for PyPI dependencies [2].

rules_pip is somewhat little known and experimental, but it seems to work much better than what we had previously.

We also unpin rules_docker and fix .bazelrc to force Bazel into Python 2 mode - hopefully, this repo will now work
fine under operating systems where `python` is python2 (as the standard dictates).

[1] - https://docs.bazel.build/versions/master/be/python.html

[2] - https://github.com/apt-itude/rules_pip

Change-Id: Ibd969a4266db564bf86e9c96275deffb9610dd44
2019-07-16 22:22:05 +00:00
q3k 92be486f39 Revert "cluster/kube/lib/nginx: use Local traffic policy"
This reverts commit 09a0f06d2a.

Reason for revert: prevents registry from being accessible on nodes:

q3k@anathema ~/Software/hscloud $ curl registry.k0.hswaw.net
<html>
[..., ok]

[root@bc01n03:~]# curl registry.k0.hswaw.net
^C

Change-Id: I0da97aaf7a8791ea3f62c70b6c1502f4a48a300f
2019-06-29 22:58:19 +00:00
q3k 09a0f06d2a cluster/kube/lib/nginx: use Local traffic policy
Diff against prod:

  - live services nginx-system.ingress-nginx
  + config services nginx-system.ingress-nginx
    {
      "apiVersion": "v1",
      "kind": "Service",
      "metadata": {
        "annotations": {},
        "labels": {
          "app.kubernetes.io/name": "ingress-nginx",
          "app.kubernetes.io/part-of": "ingress-nginx"
        },
        "name": "ingress-nginx",
        "namespace": "nginx-system"
      },
      "spec": {
  -     "externalTrafficPolicy": "Cluster",
  +     "externalTrafficPolicy": "Local",
        "ports": [
          {
            "name": "ssh",
            "port": 22,
            "protocol": "TCP",
            "targetPort": 22
          },
          {
            "name": "http",
            "port": 80,
            "protocol": "TCP",
            "targetPort": 80
          },
          {
            "name": "https",
            "port": 443,
            "protocol": "TCP",
            "targetPort": 443
          }
        ],
        "selector": {
          "app.kubernetes.io/name": "ingress-nginx",
          "app.kubernetes.io/part-of": "ingress-nginx"
        },
        "type": "LoadBalancer"
      }
    }

Change-Id: I0dd66e3f1643efa975d6180cc163a265d4b484ef
2019-06-29 22:44:53 +02:00
q3k 543b412a65 cluster/kube/lib/nginx: add gerrit forwarding
This is already running in production since gerrit was deployed - it
just got lost during submit.

Change-Id: I8a1580b1ca3ec3142a8fa4320dc9f51a599a914f
2019-06-29 22:42:39 +02:00
q3k 59f5fd315c cluster/openssl.cnf: remove
This was used in the old openssl-based TLS certificate generation code.

Change-Id: I5da8c5b012b6af8c2f8b990237b3c4933b90a349
2019-06-25 15:02:45 +02:00
q3k 184678b0f4 cluster/cube/lib/cockroachdb: clean up topology
IP addresses are not necessary in the topology definitions of a
cockroach cluster.

They were mis-commited leftovers from trying to run the cluster on
DaemonSets with hostNetworking: true.

Change-Id: I4ef1f6ed9a745efc6b05846bc13aba9d1f8dc7c8
2019-06-22 21:18:29 +00:00
q3k dec401c7dd cluster/kube/lib/cockroach: move client to deployment
This prevents a bug where kubecfg fails to update the client pod when
running a cluster/kube/cluster.jsonnet update. The pod update is
attempted because of runtime/intent differences at serviceAccounts
specification, which causes kubecfg to see a diff, which causes it to
attempt and update, which causes kube-apiserver to reject the change
(because pods are immutable), which causes kubecfg to fail.

Change-Id: I20b0ecbb264213a2eb483d475c7683b4965c82be
2019-06-22 23:14:25 +02:00
q3k c7258f4644 cluster/kube: refactor, add crdb-waw1 2019-06-21 00:24:09 +02:00
q3k e53e39a8be cluster/kube/lib/cockroachdb: use manual node pinning
We move away from the StatefulSet based deployment to manually starting
a deployment per intended node. This allows us to pin indivisual
instances of Cockroach to particular nodes, so that they state
co-located with their data.
2019-06-20 23:36:35 +02:00
q3k 662a3cdcca cluster/kube/lib/cockroachdb: refactor
We refactor this library to:

 - support multiple databases, but with a strong suggestion of having
   one per k8s cluster
 - drop the database creation logic
 - redo naming (allowing for two options: multiple clusters per
   namespace or an exclusive namespace for the cluster)
 - unhardcode dns names
2019-06-20 19:45:03 +02:00
q3k 224a50bbfe cluster/kube/lib/cockroach: fix imports 2019-06-20 16:43:01 +02:00
q3k 3c117fa841 make cockroachdb into a cluster service 2019-06-20 16:43:01 +02:00
q3k c3b0f7627c cluster/kube: set operator replicas to 0 2019-06-20 16:42:19 +02:00
q3k c0fc3ee442 cluster/clustercfg: add clustercfg-nocerts 2019-06-20 16:11:38 +02:00
q3k f970a7ef0f nix/cluster-configuration: fix CNI plugins being deleted on kubelet restart 2019-06-20 12:51:51 +02:00
q3k f81f7d462a cluster/clustercfg: gitignore __pycache__ 2019-05-19 03:11:18 +02:00
q3k aa68f3fdd8 secretstore: add implr 2019-05-18 00:15:25 +02:00
q3k 36cc4fb61a bazel-cache: deploy, add waw-hdd-yolo-1 ceph pool 2019-05-17 18:09:39 +02:00
informatic fc514a9b52 cluster/kube/cert-manager: don't add APIService when webhooks are disabled 2019-05-05 12:12:13 +02:00
informatic b187bf5b2c cluster/kube/metallb: downgrade to 0.7.3 2019-05-05 12:11:14 +02:00
q3k 321fad9865 cluster/kube/rook: lower debug 2019-04-19 14:14:36 +02:00
q3k ed2e670c8b cluster/kube/rook: bump to ceph v14 fully 2019-04-19 13:27:20 +02:00
informatic 56918237ed cluster: update ceph README 2019-04-09 23:48:33 +02:00
informatic 5ac85c6e73 cluster/kube: refactor rook.io object store configuration 2019-04-09 21:45:32 +02:00
informatic 6da3b288dc WIP: app/registry: ceph object storage 2019-04-09 13:48:21 +02:00
informatic e24ccd678c clustercfg: fix broken admincreds generation 2019-04-09 13:43:54 +02:00
informatic 598a079f57 clustercfg: extract cfssl handling to separate function 2019-04-09 13:29:33 +02:00
q3k 73cef11c85 *: rejigger tls certs and more
This pretty large change does the following:

 - moves nix from bootstrap.hswaw.net to nix/
 - changes clustercfg to use cfssl and moves it to cluster/clustercfg
 - changes clustercfg to source information about target location of
   certs from nix
 - changes clustercfg to push nix config
 - changes tls certs to have more than one CA
 - recalculates all TLS certs
   (it keeps the old serviceaccoutns key, otherwise we end up with
   invalid serviceaccounts - the cert doesn't match, but who cares,
   it's not used anyway)
2019-04-07 00:06:23 +02:00
q3k 242152f65e cluster/kube/lib/metallb: bump memory hoping to prevent crashes 2019-04-04 16:54:00 +02:00
q3k 0f78cea802 Merge branch 'master' of hackerspace.pl:hscloud 2019-04-02 14:45:23 +02:00
q3k 2fd5861d24 cluster: some doc updates 2019-04-02 14:45:17 +02:00
informatic 3187c59a86 cluster/kube: ceph dashboard tls certificates 2019-04-02 14:44:04 +02:00
informatic 2afe604595 cluster/kube: minor cert-manager cleanups, disable webhooks by default 2019-04-02 14:43:34 +02:00
informatic 79ddbc57d9 cluster/kube: initial cert-manager implementation 2019-04-02 13:20:15 +02:00
q3k 65f3b1d8ab cluster/kube: add waw-hdd-redundant-1 pool/storageclass 2019-04-02 01:05:38 +02:00
q3k c6da127d3f cluster/kube: ceph-waw1 up 2019-04-02 00:06:13 +02:00
q3k cdfafaf91e cluster/kube: finish rook operator 2019-04-01 19:16:18 +02:00
q3k b7fcc67f42 cluster/kube: start implementing rook 2019-04-01 18:40:50 +02:00
q3k 14cbacb81a cluster/kube/metallb: parametrize address pools 2019-04-01 18:00:44 +02:00
q3k a9c7e86687 cluster: fix metallb, add nginx ingress controller 2019-04-01 17:56:28 +02:00
q3k eeed6fb6da recertify all certs 2019-04-01 16:19:28 +02:00
q3k 1e565dc4a5 cluster: start implementing metallb 2019-01-18 09:40:59 +01:00
q3k e3af1eb852 cluster: autodetect IP address
This is so that Calico starts with the proper subnet. Feeding it just an
IP from the node status will mean it parses it as /32 and uses IPIP
tunnels for all connectivity.
2019-01-18 09:39:57 +01:00
q3k 41bd2b52c2 cluster/secrets: add implr 2019-01-17 23:37:36 +01:00
q3k f3010ee1cb cluster/secrets: add cz2 2019-01-17 21:35:52 +01:00
q3k dc9c29ac90 cluster: add calico key 2019-01-17 21:35:28 +01:00
q3k 5c75574464 cluster/coredns: allow resolving via <svc>.<namespace>.svc.k0.hswaw.net 2019-01-17 21:35:10 +01:00
q3k af3be426ad cluster: deploy calico and metrics service 2019-01-17 18:57:19 +01:00
q3k 49b9a13d28 cluster: deploy coredns 2019-01-14 00:02:59 +01:00
q3k 5bebbebe3e cluster/kube: fix typo 2019-01-13 22:08:05 +01:00
q3k 4d9e72cb8c cluster/kube: init 2019-01-13 22:06:33 +01:00
q3k d89e1203d9 ca: bump srl 2019-01-13 22:06:11 +01:00
q3k ae56b6a6a5 clustercfg: create .kubectl 2019-01-13 21:39:16 +01:00
q3k cd23740185 cluster/secrets: keep plain/ dir for scripting 2019-01-13 21:37:35 +01:00
q3k de061801db *: k0.hswaw.net somewhat working 2019-01-13 21:14:02 +01:00
q3k f2a812b9fd *: bazelify 2019-01-13 17:51:34 +01:00
q3k 60b19af41e *: reorganize 2019-01-13 14:15:09 +01:00