1
0
Fork 0
Commit Graph

302 Commits (master)

Author SHA1 Message Date
q3k 0dcc702c64 cluster: bump nearly-expired certs
This makes clustercfg ensure certificates are valid for at least 30
days, and renew them otherwise.

We use this to bump all the certs that were about to expire in a week.
They are now valid until 2021.

There's still some certs that expire in 2020. We need to figure out a
better story for this, especially as the next expiry is 2021 - todays
prod rollout was somewhat disruptive (basically this was done by a full
cluster upgrade-like rollout flow, via clustercfg).

We also drive-by bump the number of mons in ceph-waw3 to 3, as it shouls
be (this gets rid of a nasty SPOF that would've bitten us during this
upgrade otherwise).

Change-Id: Iee050b1b9cba4222bc0f3c7bce9e4cf9b25c8bdc
2020-03-28 18:01:40 +01:00
q3k 90e8e68bab crdb.k0: add bugless-dev (for q3k)
Change-Id: I3988e1c37f0a0c54ef1ba248f01e026d6e8c72b6
2020-03-25 10:55:05 +01:00
q3k e186c87c1b cluster: bump rook to 1.0.6
In preparation for updating to 1.1.0, which will be much more involved.

Also fix a typo in registry.libsonnet, whoops.

Change-Id: I7668bf53c7580f99fdf56fe6227f04a468f8de50
2020-02-21 12:57:02 +01:00
q3k 114edc2398 kube/mirko: add kube.CephObjectStoreUser
Change-Id: I2a67076eeaf41ada41f5ae3ee588025e4c16b9e1
2020-02-18 22:55:13 +01:00
q3k 0d83300b18 cluster: set ceph-waw3 mon replicas to 1
This reflects current production. This needs to get bumped up to 3 at some point as otherwise we lose HA for this cluster.

Change-Id: Ie5937e6a216b635ecbc4c82ecd182a410167c3f8
2020-02-15 11:48:39 +00:00
q3k 58d08595f1 {cluster,}/README: update
Change-Id: Ie211fd34316c407f29506b67187632fd22a4f75b
2020-02-15 01:00:42 +01:00
q3k d7364520e9 cluster: bump kubelets to 1.14.3
Change-Id: I02ed978a49629cdfc3f3587ad640e8cc5a5fad23
2020-02-02 23:43:28 +01:00
q3k e2095b2ce9 cluster: remove unused module-cluster.nix
Change-Id: I819d803fc7454cfd63a11a109ec73c9578f598b8
2020-02-02 23:43:00 +01:00
q3k c78cc13528 cluster/nix: locally build nixos derivations
We change the existing behaviour (copy files & run nixos-rebuild switch)
to something closer to nixops-style. This now means that provisioning
admin machines need Nix installed locally, but that's probably an okay
choice to make.

The upside of this approach is that it's easier to debug and test
derivations, as all data is local to the repo and the workstation, and
deploying just means copying a configuration closure and switching the
system to it. At some point we should even be able to run the entire
cluster within a set of test VMs.

We also bump the kubernetes control plane to 1.14. Kubelets are still at
1.13 and their upgrade is comint up today too.

Change-Id: Ia9832c47f258ee223d93893d27946d1161cc4bbd
2020-02-02 22:31:53 +01:00
q3k aa76e55eea cert-manager: fix DNS for http01 k0 splitdns
Change-Id: I73847daec9796cb891cf2fe58c2633c5fa768861
2019-12-29 02:49:30 +01:00
q3k 0c337acf89 benji: fix in waw2, run in waw3
This needed an upstream change to allow only some pools to be backed up,
otherwise benji would crash when stubmling upon the first PVC from a
pool that wasn't backed by the ceph cluster it was acting upon.

Change-Id: I52bf163c16352cb59fdd3dbdd576145ce1dbac03
2019-12-21 23:45:07 +01:00
q3k ba8e79e8f4 kube-apiserver: fix cert mismatch, again
This time from a bare hscloud checkout to make sure _nothing_ is fucked
up.

This causes no change remotely, just makes te repo reflect reality.

Change-Id: Ie8db01300771268e0371c3cdaf1930c8d7cbfb1a
2019-12-17 02:13:55 +01:00
q3k 050af01b83 cluster: add q3k's new SSH key
Change-Id: I872a75cc89a62c9487433fa5e8e5767953e309c9
2019-12-17 01:58:58 +01:00
q3k e5a956a1c8 *: bump to q3k's kubecfg, kubernetes 1.16
Change-Id: I302876d5a45cbfb63d87ad9f6ea9aaeff7bec17d
2019-11-17 22:38:40 +01:00
q3k fd323a0f55 cluster: sync to prod
Change-Id: If311f1ce44653bb54e0a10ad2fdd65685722a64d
2019-11-17 19:49:04 +01:00
q3k 96c428f7d7 nixops: fix
Change-Id: I15ebde319fcae3f9771da6a549e52783e0ec4409
2019-11-17 19:00:46 +01:00
q3k c33ebcc79f cluster: add ceph-waw3, move metallb to bgp
Change-Id: Iebf369f9a02e44be163ef4afc2e0f23c4b009898
2019-11-01 18:43:45 +01:00
q3k e67f6fec98 cluster/secrets: really try to fix apiserver key/cert
Change-Id: I6b0ea601246b665585adb040b9819344bc683e78
2019-10-31 17:36:44 +01:00
q3k 737cafd548 cluster/certs: fix kube-apiserver
key/cert mismatch :/

Change-Id: I3601a18d3ab1eae4183b59be43c497cd27dfe704
2019-10-31 17:30:48 +01:00
q3k d493ab66ca *: add dcr01s{22,24}
Change-Id: I072e825e2e1d199d9da50b9d38a9ffba68e61182
2019-10-31 17:07:50 +01:00
q3k 6f773e0004 smsgw: productionize, implement kube/mirko
This productionizes smsgw.

We also add some jsonnet machinery to provide a unified service for Go
micro/mirkoservices.

This machinery provides all the nice stuff:
 - a deployment
 - a service for all your types of pots
 - TLS certificates for HSPKI

We also update and test hspki for a new name scheme.

Change-Id: I292d00f858144903cbc8fe0c1c26eb1180d636bc
2019-10-04 13:52:34 +02:00
q3k d186e9468d cluster: move prodvider to kubernetes.default.svc.k0.hswaw.net
In https://gerrit.hackerspace.pl/c/hscloud/+/70 we accidentally
introduced a split-horizon DNS situation:

 - k0.hswaw.net from the Internet resolves to nodes running the k8s API
   servers, and as such can serve API server traffic
 - k0.hswaw.net from the cluster returned no results

This broke prodvider in two ways:
 - it dialed the API servers at k0.hswaw.net
 - even after the endpoint was moved to
   kubernetes.default.svc.k0.hswaw.net, the apiserver cert didn't cover
   that

Thus, not only we had to change the prodvider endpoint but also change
the APIserver certs to cover this new name.

I'm not sure this should be the target fix. I think at some point we
should only start referring to in-cluster services via their full (or
cluster.local) names, but right now k0.hswaw.net is an exception and as
such a split, and we have no way to access the internal services from
the outside just yet.

However, getting prodvider to work is important enough that this fix is
IMO good enough for now.

Change-Id: I13d0681208c66f4060acecc78b7ae14b8f8d7125
2019-10-04 13:52:34 +02:00
q3k e31d64f265 kube: move cert-manager resources to kube.local.libsonnet
This way kubernetes consumers don't have to import anything from
cluster/, hopefully.

We also create a small abstraction for local additions for
kube.libsonnet without having to modify upstream.

Change-Id: I209095781f91c8867250a647fe944370cddd67d0
2019-10-02 21:03:13 +02:00
q3k 54490d385e cluster/coredns: add cluster fqdn top level domain
This means that in addition to services being discoverable the 'classic'
way:

    <svcname>.<namespace>.svc.cluster.local

They are now discoverable as:

    <svcname>.<namespace>.svc.<fqdn>

For instance, on k0 you can now internally resolve:

    $ kubectl run --rm -it foo --image=nixery.dev/shell/dnsutils bash
    bash-4.4# dig +short coffee-svc.default.svc.k0.hswaw.net
    10.10.12.192

Change-Id: Ie6875b54ed6358f30f888ca0cd96e011520ace20
2019-10-02 20:49:13 +02:00
q3k 95868eeddc benji: back up daily instead of hourly
Every benji backup seems to cycle blocks (eg. delete some and recreate
them).

Since wasabi has a minimum billing retention policy of 90 days, this
means that every uploaded and then an hour later deleted object costs
us.

Currently we seem to be storing around 200G of data in wasabi for Benji
but already have 600G of deleted objects. This is suboptimal.

This change has already been deployed on production.

Change-Id: I67302d23a1c45974fb5d51ec9a8cff28260830dc
2019-09-26 21:49:24 +00:00
q3k 57515a2525 Merge "rules_pip: update to new version" 2019-09-25 12:05:58 +00:00
q3k 5f9b1ecd67 rules_pip: update to new version
rules_pip has a new version [1] of their rule system, incompatible with the
version we used, that fixes a bunch of issues, notably:
 - explicit tagging of repositories for PY2/PY3/PY23 support
 - removal of dependency on host pip (in exchange for having to vendor
   wheels)
 - higher quality tooling for locking

We update to the newer version of pip_rules, rename the external
repository to pydeps and move requirements.txt, the lockfile and the
newly vendored wheels to third_party/, where they belong.

[1] - https://github.com/apt-itude/rules_pip/issues/16

Change-Id: I1065ee2fc410e52fca2be89fcbdd4cc5a4755d55
2019-09-25 14:05:07 +02:00
q3k 5f3a5e0310 cluster/kube: emergency fixes after evition
Some pods got evicted. Some of them broke.

  - postgres in matrix and nginx in internet because of the new policies
    (chown issues)
  - cas proxy in matrix because apparently the image was not reuploaded
    to the regsitry after ceph-waw1 died, and another node didn't have it
  - registry because it had a weak image pin an downgraded to some
    broken version on another node

Change-Id: I836036872629843c8ede1b7f67982112c90d71f0
2019-09-25 02:58:15 +02:00
q3k db2a2a029f Merge "Get in the Cluster, Benji!" 2019-09-18 20:40:12 +00:00
q3k a01c487a6e cluster: allow insecure pods in rook-ceph-system
This is required for the agent to start a socket on each host for
kubelet-to-rook access.

Change-Id: I78529df81185aeaacdcb494138f72f0224a029c6
2019-09-05 16:01:19 +00:00
q3k 13bb1bf4e3 Get in the Cluster, Benji!
Here we introduce benji [1], a backup system based on backy2. It lets us
backup Ceph RBD objects from Rook into Wasabi, our offsite S3-compatible
storage provider.

Benji runs as a k8s CronJob, every hour at 42 minutes. It does the
following:
 - runs benji-pvc-backup, which iterates over all PVCs in k8s, and backs
   up their respective PVs to Wasabi
 - runs benji enforce, marking backups outside our backup policy [2] as
   to be deleted
 - runs benji cleanup, to remove unneeded backups
 - runs a custom script to backup benji's sqlite3 database into wasabi
   (unencrypted, but we're fine with that - as the metadata only contains
   image/pool names, thus Ceph PV and pool names)

[1] - https://benji-backup.me/index.html
[2] - latest3,hours48,days7,months12, which means the latest 3 backups,
      then one backup for the next 48 hours, then one backup for the next
      7 days, then one backup for the next 12 months, for a total of 65
      backups (deduplicated, of course)

We also drive-by update some docs (make them mmore separated into
user/admin docs).

Change-Id: Ibe0942fd38bc232399c0e1eaddade3f4c98bc6b4
2019-09-02 16:33:02 +02:00
q3k 9496d9910a cluster: add nextcloud user for object store
Change-Id: Ib08be16f71ff5e1b72ca6ad436de4b12427dd407
2019-09-02 16:33:02 +02:00
q3k 42553cd044 cluster: disable unauthenticated read only port on kubelets
This port was leaking kubelet state, including information on running
pods. No secrets were leaked (if they were not text-pasted into
env/args), but this still shouldn't be available.

As far as I can tell, nothing depends on this port, other than some
enterprise load balancers that require HTTP for node 'health' checks.

Change-Id: I9549b73e0168fe3ea4dce43cbe8fdc2ca4575961
2019-09-02 16:33:02 +02:00
q3k 896926c921 prodvider: clean up LDAP connections
Change-Id: Ic95e6d1b845832fa0fb2da51b418bcdcb8fd05c4
2019-08-31 15:00:51 +02:00
q3k 71a21c7693 rook/ceph: bump
Change-Id: I046df292cad11650adb829cc8a73100cc1d1ecc8
2019-08-30 23:08:26 +02:00
q3k b13b7ffcdb prod{access,vider}: implement
Prodaccess/Prodvider allow issuing short-lived certificates for all SSO
users to access the kubernetes cluster.

Currently, all users get a personal-$username namespace in which they
have adminitrative rights. Otherwise, they get no access.

In addition, we define a static CRB to allow some admins access to
everything. In the future, this will be more granular.

We also update relevant documentation.

Change-Id: Ia18594eea8a9e5efbb3e9a25a04a28bbd6a42153
2019-08-30 23:08:18 +02:00
q3k d16454badc cert-manager: bump to v0.9.1
We just got this email:

We've been working with Jetstack, the authors of cert-manager, on a
series of fixes to the client. Cert-manager sometimes falls into a
traffic pattern where it sends really excessive traffic to Let's
Encrypt's servers, continuously. To mitigate this, we plan to start
blocking all traffic from cert-manager versions less than 0.8.0 (the
current semver minor release), as of November 1, 2019. Please upgrade
all of your cert-manager instances before then.

We're sending this email because this is the contact address of your
cert-manager instance at:

 185.236.240.37 .

Version 0.8.0 is much better but we still observe excessive traffic in
some cases. We're working with Jetstack to improve these cases. As new
versions of cert-manager are released, we will add the non-current
versions to our block list after 3 months. We strongly encourage
cert-manager users to stay up-to-date with new versions.

Also, there is an opportunity to help both Jetstack and Let's Encrypt.
Once you've upgraded, please check the logs for your cert-manager
instances from time to time. Are they making excessive requests to Let's
Encrypt (more than, say, 10 per day over multiple days)? If so, please
share details at https://github.com/jetstack/cert-manager/issues/1948 .

Thanks,
Let's Encrypt Team

Change-Id: Ic7152150ac1c96941423878c6d4b6209e07429cf
2019-08-29 17:21:49 +02:00
q3k 1fad2e5c6e bgpwtf/cccampix: draw the rest of the fucking owl
Change-Id: I49fd5906e69512e8f2d414f406edc0179522f225
2019-08-11 23:43:25 +02:00
q3k d533892efa Fix crdb-waw1
We accidentally created crdb-waw2 in
https://gerrit.hackerspace.pl/c/hscloud/+/2.

We remove it now and also backport a manual change that makes the
crdb-waw1 service public via a LoadBalancer.

Change-Id: I3bbd6f01b82c6efa458cc44776f086ba36e9f20c
2019-08-11 23:42:47 +02:00
q3k d07861b7df ceph-waw1 -> ceph-waw2
Change-Id: I03d6244b9697a9efc06492114ef90cdb01e17601
2019-08-08 17:49:31 +02:00
q3k f774f2f31d Merge "app/registry: integrate into cluster/kube" 2019-08-02 00:28:10 +00:00
q3k 654c70dad7 cluster/tools/install.sh: fix nixops graceful degradation
Nixops requires nix_rules, which in turn requires a working nix
installation.

When we split tools/install.sh into tools/install.sh and
cluster/tools/install.sh [1], we accidentally made the latter always install
all cluster tools, including nixops - even if the install.sh script
detected that the system does not have Nix installed.

[1] - https://gerrit.hackerspace.pl/c/hscloud/+/81

Change-Id: Ib5357cfe125f1393b395b28062787f3f0091f549
2019-07-23 01:37:11 +02:00
q3k 4d61d20aec app/registry: integrate into cluster/kube
This makes a registry be automatically part of the cluster
infrastructure.

Tested by running kubecfg diff, no diffs (apart from out-of-date ACLs)
found.

Change-Id: Ic0635e789cf3fb851f410bcf2865326f1fa87545
2019-07-21 16:56:41 +02:00
q3k 1663e0e93b tools: move cluster-specific stuff to cluster/tools
Change-Id: I1813bb221d1bff0d6067eceb84d23510face60ff
2019-07-21 14:26:51 +00:00
q3k 116da981c9 nix/ -> cluster/nix/
These are related to cluster bootstrapping, not generic language
libraries (like go/ and bzl/).

Change-Id: I03a83c64f3e0fa6cb615d36b4e618f5e92d886ec
2019-07-21 15:53:20 +02:00
Serge Bazanski 2ce367681a *: move away from python_rules
python_rules is completely broken when it comes to py2/py3 support.

Here, we replace it with native python rules from new Bazel versions [1] and rules_pip for PyPI dependencies [2].

rules_pip is somewhat little known and experimental, but it seems to work much better than what we had previously.

We also unpin rules_docker and fix .bazelrc to force Bazel into Python 2 mode - hopefully, this repo will now work
fine under operating systems where `python` is python2 (as the standard dictates).

[1] - https://docs.bazel.build/versions/master/be/python.html

[2] - https://github.com/apt-itude/rules_pip

Change-Id: Ibd969a4266db564bf86e9c96275deffb9610dd44
2019-07-16 22:22:05 +00:00
q3k 92be486f39 Revert "cluster/kube/lib/nginx: use Local traffic policy"
This reverts commit 09a0f06d2a.

Reason for revert: prevents registry from being accessible on nodes:

q3k@anathema ~/Software/hscloud $ curl registry.k0.hswaw.net
<html>
[..., ok]

[root@bc01n03:~]# curl registry.k0.hswaw.net
^C

Change-Id: I0da97aaf7a8791ea3f62c70b6c1502f4a48a300f
2019-06-29 22:58:19 +00:00
q3k 09a0f06d2a cluster/kube/lib/nginx: use Local traffic policy
Diff against prod:

  - live services nginx-system.ingress-nginx
  + config services nginx-system.ingress-nginx
    {
      "apiVersion": "v1",
      "kind": "Service",
      "metadata": {
        "annotations": {},
        "labels": {
          "app.kubernetes.io/name": "ingress-nginx",
          "app.kubernetes.io/part-of": "ingress-nginx"
        },
        "name": "ingress-nginx",
        "namespace": "nginx-system"
      },
      "spec": {
  -     "externalTrafficPolicy": "Cluster",
  +     "externalTrafficPolicy": "Local",
        "ports": [
          {
            "name": "ssh",
            "port": 22,
            "protocol": "TCP",
            "targetPort": 22
          },
          {
            "name": "http",
            "port": 80,
            "protocol": "TCP",
            "targetPort": 80
          },
          {
            "name": "https",
            "port": 443,
            "protocol": "TCP",
            "targetPort": 443
          }
        ],
        "selector": {
          "app.kubernetes.io/name": "ingress-nginx",
          "app.kubernetes.io/part-of": "ingress-nginx"
        },
        "type": "LoadBalancer"
      }
    }

Change-Id: I0dd66e3f1643efa975d6180cc163a265d4b484ef
2019-06-29 22:44:53 +02:00
q3k 543b412a65 cluster/kube/lib/nginx: add gerrit forwarding
This is already running in production since gerrit was deployed - it
just got lost during submit.

Change-Id: I8a1580b1ca3ec3142a8fa4320dc9f51a599a914f
2019-06-29 22:42:39 +02:00
q3k 59f5fd315c cluster/openssl.cnf: remove
This was used in the old openssl-based TLS certificate generation code.

Change-Id: I5da8c5b012b6af8c2f8b990237b3c4933b90a349
2019-06-25 15:02:45 +02:00
q3k 184678b0f4 cluster/cube/lib/cockroachdb: clean up topology
IP addresses are not necessary in the topology definitions of a
cockroach cluster.

They were mis-commited leftovers from trying to run the cluster on
DaemonSets with hostNetworking: true.

Change-Id: I4ef1f6ed9a745efc6b05846bc13aba9d1f8dc7c8
2019-06-22 21:18:29 +00:00
q3k dec401c7dd cluster/kube/lib/cockroach: move client to deployment
This prevents a bug where kubecfg fails to update the client pod when
running a cluster/kube/cluster.jsonnet update. The pod update is
attempted because of runtime/intent differences at serviceAccounts
specification, which causes kubecfg to see a diff, which causes it to
attempt and update, which causes kube-apiserver to reject the change
(because pods are immutable), which causes kubecfg to fail.

Change-Id: I20b0ecbb264213a2eb483d475c7683b4965c82be
2019-06-22 23:14:25 +02:00
q3k c7258f4644 cluster/kube: refactor, add crdb-waw1 2019-06-21 00:24:09 +02:00
q3k e53e39a8be cluster/kube/lib/cockroachdb: use manual node pinning
We move away from the StatefulSet based deployment to manually starting
a deployment per intended node. This allows us to pin indivisual
instances of Cockroach to particular nodes, so that they state
co-located with their data.
2019-06-20 23:36:35 +02:00
q3k 662a3cdcca cluster/kube/lib/cockroachdb: refactor
We refactor this library to:

 - support multiple databases, but with a strong suggestion of having
   one per k8s cluster
 - drop the database creation logic
 - redo naming (allowing for two options: multiple clusters per
   namespace or an exclusive namespace for the cluster)
 - unhardcode dns names
2019-06-20 19:45:03 +02:00
q3k 224a50bbfe cluster/kube/lib/cockroach: fix imports 2019-06-20 16:43:01 +02:00
q3k 3c117fa841 make cockroachdb into a cluster service 2019-06-20 16:43:01 +02:00
q3k c3b0f7627c cluster/kube: set operator replicas to 0 2019-06-20 16:42:19 +02:00
q3k c0fc3ee442 cluster/clustercfg: add clustercfg-nocerts 2019-06-20 16:11:38 +02:00
q3k f970a7ef0f nix/cluster-configuration: fix CNI plugins being deleted on kubelet restart 2019-06-20 12:51:51 +02:00
q3k f81f7d462a cluster/clustercfg: gitignore __pycache__ 2019-05-19 03:11:18 +02:00
q3k aa68f3fdd8 secretstore: add implr 2019-05-18 00:15:25 +02:00
q3k 36cc4fb61a bazel-cache: deploy, add waw-hdd-yolo-1 ceph pool 2019-05-17 18:09:39 +02:00
informatic fc514a9b52 cluster/kube/cert-manager: don't add APIService when webhooks are disabled 2019-05-05 12:12:13 +02:00
informatic b187bf5b2c cluster/kube/metallb: downgrade to 0.7.3 2019-05-05 12:11:14 +02:00
q3k 321fad9865 cluster/kube/rook: lower debug 2019-04-19 14:14:36 +02:00
q3k ed2e670c8b cluster/kube/rook: bump to ceph v14 fully 2019-04-19 13:27:20 +02:00
informatic 56918237ed cluster: update ceph README 2019-04-09 23:48:33 +02:00
informatic 5ac85c6e73 cluster/kube: refactor rook.io object store configuration 2019-04-09 21:45:32 +02:00
informatic 6da3b288dc WIP: app/registry: ceph object storage 2019-04-09 13:48:21 +02:00
informatic e24ccd678c clustercfg: fix broken admincreds generation 2019-04-09 13:43:54 +02:00
informatic 598a079f57 clustercfg: extract cfssl handling to separate function 2019-04-09 13:29:33 +02:00
q3k 73cef11c85 *: rejigger tls certs and more
This pretty large change does the following:

 - moves nix from bootstrap.hswaw.net to nix/
 - changes clustercfg to use cfssl and moves it to cluster/clustercfg
 - changes clustercfg to source information about target location of
   certs from nix
 - changes clustercfg to push nix config
 - changes tls certs to have more than one CA
 - recalculates all TLS certs
   (it keeps the old serviceaccoutns key, otherwise we end up with
   invalid serviceaccounts - the cert doesn't match, but who cares,
   it's not used anyway)
2019-04-07 00:06:23 +02:00
q3k 242152f65e cluster/kube/lib/metallb: bump memory hoping to prevent crashes 2019-04-04 16:54:00 +02:00
q3k 0f78cea802 Merge branch 'master' of hackerspace.pl:hscloud 2019-04-02 14:45:23 +02:00
q3k 2fd5861d24 cluster: some doc updates 2019-04-02 14:45:17 +02:00
informatic 3187c59a86 cluster/kube: ceph dashboard tls certificates 2019-04-02 14:44:04 +02:00
informatic 2afe604595 cluster/kube: minor cert-manager cleanups, disable webhooks by default 2019-04-02 14:43:34 +02:00
informatic 79ddbc57d9 cluster/kube: initial cert-manager implementation 2019-04-02 13:20:15 +02:00
q3k 65f3b1d8ab cluster/kube: add waw-hdd-redundant-1 pool/storageclass 2019-04-02 01:05:38 +02:00
q3k c6da127d3f cluster/kube: ceph-waw1 up 2019-04-02 00:06:13 +02:00
q3k cdfafaf91e cluster/kube: finish rook operator 2019-04-01 19:16:18 +02:00
q3k b7fcc67f42 cluster/kube: start implementing rook 2019-04-01 18:40:50 +02:00
q3k 14cbacb81a cluster/kube/metallb: parametrize address pools 2019-04-01 18:00:44 +02:00
q3k a9c7e86687 cluster: fix metallb, add nginx ingress controller 2019-04-01 17:56:28 +02:00
q3k eeed6fb6da recertify all certs 2019-04-01 16:19:28 +02:00
q3k 1e565dc4a5 cluster: start implementing metallb 2019-01-18 09:40:59 +01:00
q3k e3af1eb852 cluster: autodetect IP address
This is so that Calico starts with the proper subnet. Feeding it just an
IP from the node status will mean it parses it as /32 and uses IPIP
tunnels for all connectivity.
2019-01-18 09:39:57 +01:00
q3k 41bd2b52c2 cluster/secrets: add implr 2019-01-17 23:37:36 +01:00
q3k f3010ee1cb cluster/secrets: add cz2 2019-01-17 21:35:52 +01:00
q3k dc9c29ac90 cluster: add calico key 2019-01-17 21:35:28 +01:00
q3k 5c75574464 cluster/coredns: allow resolving via <svc>.<namespace>.svc.k0.hswaw.net 2019-01-17 21:35:10 +01:00
q3k af3be426ad cluster: deploy calico and metrics service 2019-01-17 18:57:19 +01:00
q3k 49b9a13d28 cluster: deploy coredns 2019-01-14 00:02:59 +01:00
q3k 5bebbebe3e cluster/kube: fix typo 2019-01-13 22:08:05 +01:00
q3k 4d9e72cb8c cluster/kube: init 2019-01-13 22:06:33 +01:00
q3k d89e1203d9 ca: bump srl 2019-01-13 22:06:11 +01:00
q3k ae56b6a6a5 clustercfg: create .kubectl 2019-01-13 21:39:16 +01:00
q3k cd23740185 cluster/secrets: keep plain/ dir for scripting 2019-01-13 21:37:35 +01:00
q3k de061801db *: k0.hswaw.net somewhat working 2019-01-13 21:14:02 +01:00
q3k f2a812b9fd *: bazelify 2019-01-13 17:51:34 +01:00
q3k 60b19af41e *: reorganize 2019-01-13 14:15:09 +01:00