hscloud

Author	SHA1	Message	Date
Sergiusz Bazanski	42553cd044	cluster: disable unauthenticated read only port on kubelets This port was leaking kubelet state, including information on running pods. No secrets were leaked (if they were not text-pasted into env/args), but this still shouldn't be available. As far as I can tell, nothing depends on this port, other than some enterprise load balancers that require HTTP for node 'health' checks. Change-Id: I9549b73e0168fe3ea4dce43cbe8fdc2ca4575961	2019-09-02 16:33:02 +02:00
Sergiusz Bazanski	896926c921	prodvider: clean up LDAP connections Change-Id: Ic95e6d1b845832fa0fb2da51b418bcdcb8fd05c4	2019-08-31 15:00:51 +02:00
Sergiusz Bazanski	71a21c7693	rook/ceph: bump Change-Id: I046df292cad11650adb829cc8a73100cc1d1ecc8	2019-08-30 23:08:26 +02:00
Sergiusz Bazanski	b13b7ffcdb	prod{access,vider}: implement Prodaccess/Prodvider allow issuing short-lived certificates for all SSO users to access the kubernetes cluster. Currently, all users get a personal-$username namespace in which they have adminitrative rights. Otherwise, they get no access. In addition, we define a static CRB to allow some admins access to everything. In the future, this will be more granular. We also update relevant documentation. Change-Id: Ia18594eea8a9e5efbb3e9a25a04a28bbd6a42153	2019-08-30 23:08:18 +02:00
Sergiusz Bazanski	d16454badc	cert-manager: bump to v0.9.1 We just got this email: We've been working with Jetstack, the authors of cert-manager, on a series of fixes to the client. Cert-manager sometimes falls into a traffic pattern where it sends really excessive traffic to Let's Encrypt's servers, continuously. To mitigate this, we plan to start blocking all traffic from cert-manager versions less than 0.8.0 (the current semver minor release), as of November 1, 2019. Please upgrade all of your cert-manager instances before then. We're sending this email because this is the contact address of your cert-manager instance at: 185.236.240.37 . Version 0.8.0 is much better but we still observe excessive traffic in some cases. We're working with Jetstack to improve these cases. As new versions of cert-manager are released, we will add the non-current versions to our block list after 3 months. We strongly encourage cert-manager users to stay up-to-date with new versions. Also, there is an opportunity to help both Jetstack and Let's Encrypt. Once you've upgraded, please check the logs for your cert-manager instances from time to time. Are they making excessive requests to Let's Encrypt (more than, say, 10 per day over multiple days)? If so, please share details at https://github.com/jetstack/cert-manager/issues/1948 . Thanks, Let's Encrypt Team Change-Id: Ic7152150ac1c96941423878c6d4b6209e07429cf	2019-08-29 17:21:49 +02:00
Sergiusz Bazanski	1fad2e5c6e	bgpwtf/cccampix: draw the rest of the fucking owl Change-Id: I49fd5906e69512e8f2d414f406edc0179522f225	2019-08-11 23:43:25 +02:00
Sergiusz Bazanski	d533892efa	Fix crdb-waw1 We accidentally created crdb-waw2 in https://gerrit.hackerspace.pl/c/hscloud/+/2. We remove it now and also backport a manual change that makes the crdb-waw1 service public via a LoadBalancer. Change-Id: I3bbd6f01b82c6efa458cc44776f086ba36e9f20c	2019-08-11 23:42:47 +02:00
Sergiusz Bazanski	d07861b7df	ceph-waw1 -> ceph-waw2 Change-Id: I03d6244b9697a9efc06492114ef90cdb01e17601	2019-08-08 17:49:31 +02:00
q3k	f774f2f31d	Merge "app/registry: integrate into cluster/kube"	2019-08-02 00:28:10 +00:00
Sergiusz Bazanski	654c70dad7	cluster/tools/install.sh: fix nixops graceful degradation Nixops requires nix_rules, which in turn requires a working nix installation. When we split tools/install.sh into tools/install.sh and cluster/tools/install.sh [1], we accidentally made the latter always install all cluster tools, including nixops - even if the install.sh script detected that the system does not have Nix installed. [1] - https://gerrit.hackerspace.pl/c/hscloud/+/81 Change-Id: Ib5357cfe125f1393b395b28062787f3f0091f549	2019-07-23 01:37:11 +02:00
Sergiusz Bazanski	4d61d20aec	app/registry: integrate into cluster/kube This makes a registry be automatically part of the cluster infrastructure. Tested by running kubecfg diff, no diffs (apart from out-of-date ACLs) found. Change-Id: Ic0635e789cf3fb851f410bcf2865326f1fa87545	2019-07-21 16:56:41 +02:00
Sergiusz Bazanski	1663e0e93b	tools: move cluster-specific stuff to cluster/tools Change-Id: I1813bb221d1bff0d6067eceb84d23510face60ff	2019-07-21 14:26:51 +00:00
Sergiusz Bazanski	116da981c9	nix/ -> cluster/nix/ These are related to cluster bootstrapping, not generic language libraries (like go/ and bzl/). Change-Id: I03a83c64f3e0fa6cb615d36b4e618f5e92d886ec	2019-07-21 15:53:20 +02:00
Serge Bazanski	2ce367681a	*: move away from python_rules python_rules is completely broken when it comes to py2/py3 support. Here, we replace it with native python rules from new Bazel versions [1] and rules_pip for PyPI dependencies [2]. rules_pip is somewhat little known and experimental, but it seems to work much better than what we had previously. We also unpin rules_docker and fix .bazelrc to force Bazel into Python 2 mode - hopefully, this repo will now work fine under operating systems where `python` is python2 (as the standard dictates). [1] - https://docs.bazel.build/versions/master/be/python.html [2] - https://github.com/apt-itude/rules_pip Change-Id: Ibd969a4266db564bf86e9c96275deffb9610dd44	2019-07-16 22:22:05 +00:00
q3k	92be486f39	Revert "cluster/kube/lib/nginx: use Local traffic policy" This reverts commit `09a0f06d2a`. Reason for revert: prevents registry from being accessible on nodes: q3k@anathema ~/Software/hscloud $ curl registry.k0.hswaw.net <html> [..., ok] [root@bc01n03:~]# curl registry.k0.hswaw.net ^C Change-Id: I0da97aaf7a8791ea3f62c70b6c1502f4a48a300f	2019-06-29 22:58:19 +00:00
Sergiusz Bazanski	09a0f06d2a	cluster/kube/lib/nginx: use Local traffic policy Diff against prod: - live services nginx-system.ingress-nginx + config services nginx-system.ingress-nginx { "apiVersion": "v1", "kind": "Service", "metadata": { "annotations": {}, "labels": { "app.kubernetes.io/name": "ingress-nginx", "app.kubernetes.io/part-of": "ingress-nginx" }, "name": "ingress-nginx", "namespace": "nginx-system" }, "spec": { - "externalTrafficPolicy": "Cluster", + "externalTrafficPolicy": "Local", "ports": [ { "name": "ssh", "port": 22, "protocol": "TCP", "targetPort": 22 }, { "name": "http", "port": 80, "protocol": "TCP", "targetPort": 80 }, { "name": "https", "port": 443, "protocol": "TCP", "targetPort": 443 } ], "selector": { "app.kubernetes.io/name": "ingress-nginx", "app.kubernetes.io/part-of": "ingress-nginx" }, "type": "LoadBalancer" } } Change-Id: I0dd66e3f1643efa975d6180cc163a265d4b484ef	2019-06-29 22:44:53 +02:00
Sergiusz Bazanski	543b412a65	cluster/kube/lib/nginx: add gerrit forwarding This is already running in production since gerrit was deployed - it just got lost during submit. Change-Id: I8a1580b1ca3ec3142a8fa4320dc9f51a599a914f	2019-06-29 22:42:39 +02:00
Sergiusz Bazanski	59f5fd315c	cluster/openssl.cnf: remove This was used in the old openssl-based TLS certificate generation code. Change-Id: I5da8c5b012b6af8c2f8b990237b3c4933b90a349	2019-06-25 15:02:45 +02:00
Sergiusz Bazanski	184678b0f4	cluster/cube/lib/cockroachdb: clean up topology IP addresses are not necessary in the topology definitions of a cockroach cluster. They were mis-commited leftovers from trying to run the cluster on DaemonSets with hostNetworking: true. Change-Id: I4ef1f6ed9a745efc6b05846bc13aba9d1f8dc7c8	2019-06-22 21:18:29 +00:00
Sergiusz Bazanski	dec401c7dd	cluster/kube/lib/cockroach: move client to deployment This prevents a bug where kubecfg fails to update the client pod when running a cluster/kube/cluster.jsonnet update. The pod update is attempted because of runtime/intent differences at serviceAccounts specification, which causes kubecfg to see a diff, which causes it to attempt and update, which causes kube-apiserver to reject the change (because pods are immutable), which causes kubecfg to fail. Change-Id: I20b0ecbb264213a2eb483d475c7683b4965c82be	2019-06-22 23:14:25 +02:00
Sergiusz Bazanski	c7258f4644	cluster/kube: refactor, add crdb-waw1	2019-06-21 00:24:09 +02:00
Sergiusz Bazanski	e53e39a8be	cluster/kube/lib/cockroachdb: use manual node pinning We move away from the StatefulSet based deployment to manually starting a deployment per intended node. This allows us to pin indivisual instances of Cockroach to particular nodes, so that they state co-located with their data.	2019-06-20 23:36:35 +02:00
Sergiusz Bazanski	662a3cdcca	cluster/kube/lib/cockroachdb: refactor We refactor this library to: - support multiple databases, but with a strong suggestion of having one per k8s cluster - drop the database creation logic - redo naming (allowing for two options: multiple clusters per namespace or an exclusive namespace for the cluster) - unhardcode dns names	2019-06-20 19:45:03 +02:00
Sergiusz Bazanski	224a50bbfe	cluster/kube/lib/cockroach: fix imports	2019-06-20 16:43:01 +02:00
Sergiusz Bazanski	3c117fa841	make cockroachdb into a cluster service	2019-06-20 16:43:01 +02:00
Sergiusz Bazanski	c3b0f7627c	cluster/kube: set operator replicas to 0	2019-06-20 16:42:19 +02:00
Sergiusz Bazanski	c0fc3ee442	cluster/clustercfg: add clustercfg-nocerts	2019-06-20 16:11:38 +02:00
Sergiusz Bazanski	f970a7ef0f	nix/cluster-configuration: fix CNI plugins being deleted on kubelet restart	2019-06-20 12:51:51 +02:00
Sergiusz Bazanski	f81f7d462a	cluster/clustercfg: gitignore __pycache__	2019-05-19 03:11:18 +02:00
Sergiusz Bazanski	aa68f3fdd8	secretstore: add implr	2019-05-18 00:15:25 +02:00
Sergiusz Bazanski	36cc4fb61a	bazel-cache: deploy, add waw-hdd-yolo-1 ceph pool	2019-05-17 18:09:39 +02:00
Piotr Dobrowolski	fc514a9b52	cluster/kube/cert-manager: don't add APIService when webhooks are disabled	2019-05-05 12:12:13 +02:00
Piotr Dobrowolski	b187bf5b2c	cluster/kube/metallb: downgrade to 0.7.3	2019-05-05 12:11:14 +02:00
Sergiusz Bazanski	321fad9865	cluster/kube/rook: lower debug	2019-04-19 14:14:36 +02:00
Sergiusz Bazanski	ed2e670c8b	cluster/kube/rook: bump to ceph v14 fully	2019-04-19 13:27:20 +02:00
Piotr Dobrowolski	56918237ed	cluster: update ceph README	2019-04-09 23:48:33 +02:00
Piotr Dobrowolski	5ac85c6e73	cluster/kube: refactor rook.io object store configuration	2019-04-09 21:45:32 +02:00
Piotr Dobrowolski	6da3b288dc	WIP: app/registry: ceph object storage	2019-04-09 13:48:21 +02:00
Piotr Dobrowolski	e24ccd678c	clustercfg: fix broken admincreds generation	2019-04-09 13:43:54 +02:00
Piotr Dobrowolski	598a079f57	clustercfg: extract cfssl handling to separate function	2019-04-09 13:29:33 +02:00
Sergiusz Bazanski	73cef11c85	*: rejigger tls certs and more This pretty large change does the following: - moves nix from bootstrap.hswaw.net to nix/ - changes clustercfg to use cfssl and moves it to cluster/clustercfg - changes clustercfg to source information about target location of certs from nix - changes clustercfg to push nix config - changes tls certs to have more than one CA - recalculates all TLS certs (it keeps the old serviceaccoutns key, otherwise we end up with invalid serviceaccounts - the cert doesn't match, but who cares, it's not used anyway)	2019-04-07 00:06:23 +02:00
Sergiusz Bazanski	242152f65e	cluster/kube/lib/metallb: bump memory hoping to prevent crashes	2019-04-04 16:54:00 +02:00
Sergiusz Bazanski	0f78cea802	Merge branch 'master' of hackerspace.pl:hscloud	2019-04-02 14:45:23 +02:00
Sergiusz Bazanski	2fd5861d24	cluster: some doc updates	2019-04-02 14:45:17 +02:00
Piotr Dobrowolski	3187c59a86	cluster/kube: ceph dashboard tls certificates	2019-04-02 14:44:04 +02:00
Piotr Dobrowolski	2afe604595	cluster/kube: minor cert-manager cleanups, disable webhooks by default	2019-04-02 14:43:34 +02:00
Piotr Dobrowolski	79ddbc57d9	cluster/kube: initial cert-manager implementation	2019-04-02 13:20:15 +02:00
Sergiusz Bazanski	65f3b1d8ab	cluster/kube: add waw-hdd-redundant-1 pool/storageclass	2019-04-02 01:05:38 +02:00
Sergiusz Bazanski	c6da127d3f	cluster/kube: ceph-waw1 up	2019-04-02 00:06:13 +02:00
Sergiusz Bazanski	cdfafaf91e	cluster/kube: finish rook operator	2019-04-01 19:16:18 +02:00

1 2

70 commits