hscloud

cheshire

hscloud

Author	SHA1	Message	Date
q3k	e7fca3acd8	ci_presubmit: init This will be, at some point, a script to run on Gerrit presubmit (ie. right before merge). For now, you can manually run it to ensure that Everything At Least Kinda Works. Change-Id: I28b305fa81a4ca4a8e94ce4daa06fe9ae0184fe8	2020-09-25 21:15:07 +00:00
q3k	f00a701f27	tools: remove unused go_sdk.bzl This is a leftover from an old attempt at NixOS compatibility. Change-Id: I5050f76b83f47796cdfa6235db8ee5efe8daf3e2	2020-09-25 21:01:12 +00:00
q3k	4e8622df35	djtest: use pyelftools to find uwsgi ld.so Change-Id: I54bdaa588ff15d8c6ca73c4307076a93a5682d78	2020-09-25 21:00:11 +00:00
q3k	a5ed644980	k0.hswaw.net: pass metallb through Calico Previously, we had the following setup: .-----------. \| ..... \| .-----------.-\| \| dcr01s24 \| \| .-----------.-\| \| \| dcr01s22 \| \| \| .---\|-----------\| \|-' .--------. \| \|---------. \| \| \| dcsw01 \| <----- \| metallb \| \|-' '--------' \|---------' \| '-----------' Ie., each metallb on each node directly talked to dcsw01 over BGP to announce ExternalIPs to our L3 fabric. Now, we rejigger the configuration to instead have Calico's BIRD instances talk BGP to dcsw01, and have metallb talk locally to Calico. .-------------------------. \| dcr01s24 \| \|-------------------------\| .--------. \|---------. .---------. \| \| dcsw01 \| <----- \| Calico \|<--\| metallb \| \| '--------' \|---------' '---------' \| '-------------------------' This makes Calico announce our pod/service networks into our L3 fabric! Calico and metallb talk to eachother over 127.0.0.1 (they both run with Host Networking), but that requires one side to flip to pasive mode. We chose to do that with Calico, by overriding its BIRD config and special-casing any 127.0.0.1 peer to enable passive mode. We also override Calico's Other Bird Template (bird_ipam.cfg) to fiddle with the kernel programming filter (ie. to-kernel-routing-table filter), where we disable programming unreachable routes. This is because routes coming from metallb have their next-hop set to 127.0.0.1, which makes bird mark them as unreachable. Unreachable routes in the kernel will break local access to ExternalIPs, eg. register access from containerd. All routes pass through without route reflectors and a full mesh as we use eBGP over private ASNs in our fabric. We also have to make Calico aware of metallb pools - otherwise, routes announced by metallb end up being filtered by Calico. This is all mildly hacky. Here's hoping that Calico will be able to some day gain metallb-like functionality, ie. IPAM for externalIPs/LoadBalancers/... There seems to be however one problem with this change (but I'm not fixing it yet as it's not critical): metallb would previously only announce IPs from nodes that were serving that service. Now, however, the Calico internal mesh makes those appear from every node. This can probably be fixed by disabling local meshing, enabling route reflection on dcsw01 (to recreate the mesh routing through dcsw01). Or, maybe by some more hacking of the Calico BIRD config :/. Change-Id: I3df1f6ae7fa1911dd53956ced3b073581ef0e836	2020-09-23 18:55:12 +00:00
q3k	0dd5195766	hackdoc: bump Change-Id: I027a7d8f30d55773ec0e2ec7700bd780e417cb19	2020-09-23 18:31:35 +00:00
q3k	2b8f3c4af7	Merge changes Ib91e4d3b,I5d41fa12,I839863a8 * changes: hackdoc: render TOC inline hackdoc: fix pub_listen flag in readme hackdoc: do not add ?ref= to intra-links unless necessary	2020-09-23 18:14:49 +00:00
q3k	0a2f413b4c	hackdoc: render TOC inline Change-Id: Ib91e4d3b73354e7e19095ea62eed70a23ef96512	2020-09-23 18:13:20 +00:00
q3k	80380f4444	hackdoc: fix pub_listen flag in readme Change-Id: I5d41fa12f29ec5cff9251bb0ad77fc5fdafef786	2020-09-23 18:13:20 +00:00
q3k	26f44da5f1	hackdoc: do not add ?ref= to intra-links unless necessary Change-Id: I839863a8c10c54fae11100b885c972bed348eba6	2020-09-23 18:13:20 +00:00
q3k	059fdfed3b	k0: add resource requests/limits to nginx, remove gitea We just had an outage seemingly caused by N-I-C sendings tons of traffic to gitea, which in turn caused N-I-C to balloon in memory/CPU usage. I haven't debugged the cause of this traffic, but I have disabled the gitea TCP forward to Stop The Bleeding. This change reflects ad-hoc production changes. Change-Id: I37e11609f408fa3e3fbfafafba44dc83149b90a9	2020-09-20 22:53:40 +00:00
q3k	242ec58a33	k0: add waw-hdd-redundant-q3k-3 Change-Id: Id3718877d1e67d48c6726d7649a565db657cfc82	2020-09-20 15:36:24 +00:00
q3k	c09d8fedcc	Merge "app/onlyoffice: init"	2020-09-16 16:59:06 +00:00
q3k	5533ce9075	matrix: bump synapse to 1.19.2 This has already been deployed to production. Change-Id: I0ebf818193bd161d6565a9ec4eddc785e79d9077	2020-09-16 14:20:09 +00:00
q3k	06b61d4d47	app/onlyoffice: init This deploys office.hackerspace.pl. It's a collaborative document editing server that works with Nextcloud. This is already live, and can be tested with owncloud.hackerspace.pl (new -> document). Change-Id: Ic8055a8a6679e7a0695ebb9e41108074d8f789af	2020-09-15 18:23:08 +00:00
q3k	1230ac38b5	matrix: enable metrics Change-Id: Ia916cb1311ab079153ba37818455170e85e437bc	2020-09-12 22:26:12 +00:00
patryk	8d069d8d1a	cluster/certs: refresh prodvider CA Change-Id: I35578fb62ddf10e7419c2c347e70322cf4ea0b6a	2020-09-01 22:02:52 +00:00
radex	81da4e5823	laserproxy: extend deadline to 60min & random changes Change-Id: I2601d2da8da567d8dd6beecc630de911d5d161c3	2020-08-28 19:52:38 +02:00
radex	30b6be82e6	Revert "radex: test" This reverts commit `04f9d2e2f1`. Reason for revert: <INSERT REASONING HERE> Change-Id: If29d212656ef30cf9cf53f507ff029f83c9da028	2020-08-27 20:36:46 +00:00
radex	04f9d2e2f1	radex: test Change-Id: I780578d44eac4e81624b88e20aa7da85b8fd5505	2020-08-27 20:33:26 +00:00
q3k	dc496d21a1	Merge "cluster/nix: update nodes"	2020-08-27 15:13:51 +00:00
q3k	1db03c32b6	matrix: fix iOS signup issues by specifying public_baseurl WHITE WHALE HOLY GRAIL Complex systems are complex. Let me tell you a story about that. Matrix clients perform their last stage of login by performing a POST to /_matrix/client/r0/login on the Matrix homeserver they log in to. How they reach the Homeserver is specified earlier - either by using discovery via SRV or .well-known, or by the client manually specifying the Matrix homeserver URL. Regardless of how they reach this endpoint in the first place, this POST endpoint, as per the Matrix Client-Server API Specification (r0.6.1), MAY return a `well_known` key, which MUST contain a `homeserver` address, pointing to the address of the homeserver which the client should talk to. If present, the client SHOULD use that instead of whatever it connected to so far. Issue the first: the iOS client requires `well_known` in that response, and doesn't work otherwise. https://github.com/vector-im/element-ios/issues/3448 Issue the second: Synapse will return `well_known` accordingly, but only if `public_baseurl` is set in its configuration. It is not required to be set. If not set, it will simply not return this key. Shrek the third: we never set `public_baseurl` in Synapse, and the first issue (iOS needing `well_known`) only became a regression in https://github.com/vector-im/element-ios/issues/2715 . As such, it was difficult to troubleshoot this issue, and we kept getting on some red herrings: is it the SSO? Is our server broken? Is the iOS implementation broken? But now we know - https://github.com/vector-im/element-ios/issues/2715 seems to be the true culprit. Change-Id: I913792e31e3c6813d4e51d4befdba720cad3f532	2020-08-26 18:10:36 +00:00
q3k	de6275101b	matrix: add Telegram bridge appservice. Configuring this one is a bit different from appservice-irc. Notably, there's no way to give it a registration.yaml to overlay on top of a config, se we end up using an init container with yq to do that for us. Also, I had to manually copy the regsitration.yaml in synapse, from /appservices/telegram-prod/registration.yaml to /data/appservices/telegram-prod.jsonnet, in order to make it work with the synapse docker start magic. :/ Otherwise, this is deployed and seems to be working. Change-Id: Id747a0e310221855556c1d280439376f0c4e5ed6	2020-08-24 21:20:39 +00:00
q3k	cdba291e7d	matrix: split up appservice to separate file This is in preparation for adding a Telegram bridge appservice. The main jsonnet file was getting quite chonky. This does not affect production, and is just a refactor. Change-Id: I7cdee2bd71aedb40a9f6c3e5148f829023171dcb	2020-08-24 19:14:04 +00:00
q3k	c0c037aad9	app/matrix: migrate postgres and data to waw3 The way this was migrated is not to be spoken of. (hint: it involved downtime, and mounting two volumes at once) appservice-irc has some storage, we should migrate that to waw3, too. But it's not as critical. The new storage (waw3) is _much_ faster. Change-Id: I4b4bd32e4fedc514753d25bac35d001e8a9c5f00	2020-08-24 19:12:08 +00:00
q3k	35d437883b	kube/policies: implement mostlysecure This now allows to run apt and should allow to run most upstream docker images. In return, we prohibit some mildly sketchy stuff. But this is safe enough for project namespaces with limited administrative access. We should still get gvisor sooner than later... Change-Id: Ida5ccfae440bacb6f3fd55dcc34ca0addfddd5ae	2020-08-23 11:32:44 +00:00
q3k	ed71be4392	Merge "devtools: fix sourcegraph"	2020-08-23 11:06:27 +00:00
q3k	b7898a8038	devtools: fix sourcegraph Permissions get mangled on container restart. This adds an init container to fix them. Change-Id: I37c44e23a75b8ec41e6aba2ed38eee223496b8b9	2020-08-23 11:05:57 +00:00
q3k	99db0cd62f	Merge "cluster/clustercfg: fix BUILD"	2020-08-23 01:38:25 +00:00
q3k	1b15dc46ea	app/matrix: move appservice-irc to bc01n03 When deploying https://gerrit.hackerspace.pl/c/hscloud/+/401 we manually re-pinned appservice-irc to run on bc01n03 (to prevent reschedule as bc01n02 was updated while bc01n03 was already done). This change makes git reflect production. Change-Id: I2518a8a227bfacefd9f1905ded5a1d65e379845f	2020-08-23 01:03:00 +02:00
q3k	316411790a	cluster/nix: update nodes - we update NixOS to 20.09pre - we fix an ACME option that's now required - we switch from systemd-timesyncd to chrony (as timesyncd took a long time to sync clocks after restart, leading to MON_CLOCK_SKEW errors from ceph) This has been deployed in production. Change-Id: Ibfcd41567235bae3e3d8abeeed61f4694ae614ad	2020-08-23 00:58:29 +02:00
q3k	bc73a44519	cluster/clustercfg: fix BUILD This is continued fallout after migrating from rules_pip. Change-Id: Idb9b4d4f22aa36512d220ac31375bae7a0f25e4e	2020-08-22 20:33:37 +00:00
q3k	31e41d5ff7	Merge changes I4ecc5002,Iff21654e,I312be8e8 * changes: kube/kube.libsonnet: add OpenAPI.Require kube/kube.libsonnet: add Contain to Namespace kube/kube.libsonnet: add CertificateVolume	2020-08-22 20:32:02 +00:00
q3k	d5918c8e72	cluster: change q3k's laptop key Paranoia is dead, long live Mimeomia. This has already been deployed to production. Change-Id: Ibbc5015b5277380a3450f76e62d3fab6e71be1a0	2020-08-22 22:29:42 +02:00
q3k	0b6d5d526f	kube/kube.libsonnet: add OpenAPI.Require This allows for the following: local oa = kube.OpenAPI, vaidation: oa.Validation(oa.Dict { foo: oa.Required(oa.String), bar: oa.Required(oa.Array(oa.Dict { baz: oa.Boolean, })), }), No more `oa.String { required:: true }`! Change-Id: I4ecc5002e83a8a1cfcdf083d425d7decd4cf8871	2020-08-22 19:01:01 +00:00
q3k	5a89d225e7	kube/kube.libsonnet: add Contain to Namespace This allow for the following: ns: kube.Namespace("foo"), service: self.ns.Contain(kube.Service("bar")) { spec+: { // ... }, }, No more `metadata+: { namespace: ... }` ! Change-Id: Iff21654e18919afbe60c574e560356c6bd6d9b89	2020-08-22 18:57:30 +00:00
q3k	394dd83219	kube/kube.libsonnet: add CertificateVolume CertificateVolume is like SecretVolume, but for secrets generated from Certificates. Change-Id: I312be8e84c856221173583df478ec5317aa948c0	2020-08-22 18:56:53 +00:00
q3k	8887655aa8	go/mirko: fix trace logging Change-Id: I95b8ce32ad529ffe0b43282f5761495df78b2b10	2020-08-16 13:25:40 +00:00
q3k	b97a303f89	Merge "hswaw/ldapweb: bump"	2020-08-15 18:44:03 +00:00
q3k	fceedd1bab	hswaw/ldapweb: bump This pulls in https://code.hackerspace.pl/q3k/ldap-web-public/commit/?id=1cced0d613f4ec8b454c1a6c6fd9bb01eed391e3 Change-Id: Ib676d09084bf1bd00bfa88eab980353550525729	2020-08-15 18:43:46 +00:00
q3k	0581bbf8a0	games/factorio: add modproxy This adds a mod proxy system, called, well, modproxy. It sits between Factorio server instances and the Factorio mod portal, allowing for arbitrary mod download without needing the servers to know Factorio credentials. Change-Id: I7bc405a25b6f9559cae1f23295249f186761f212	2020-08-14 13:03:46 +02:00
q3k	791ab6d1a5	factorio: bump to 1.0.0 Change-Id: I24c96e556ae4054fb1b25e671341f2cb671010c2	2020-08-14 10:35:28 +00:00
q3k	15db04c705	hackdoc: deploy There's an issue with the registry that forbids me from pushing into anything but my personal namespace - might have been introduced by `0697e01144` . For now, I move the hackdoc image to my personal namespace, as at some point in the future I want to revamp the registry system, anyway. We also drive-by fix a mirko.libsonnet typo that, for some reason, hasn't manifested itself yet. Change-Id: I8544e4a52610fb84c5c9d8b0de449f785248f60f	2020-08-10 18:57:26 +02:00
q3k	d40bd1bd71	README: link to cs instead of gitiles Change-Id: Iaaa6cbe1327fc75dfd642bbfe5677740bb9b2fb6	2020-08-10 18:03:04 +02:00
q3k	77a5a4b388	Merge "hackdoc: do not render links to pages that wouldn't serve anything"	2020-08-10 16:01:51 +00:00
q3k	d701c4ebc6	hackdoc: do not render links to pages that wouldn't serve anything This gets rid of annoying clickable 404 links. Change-Id: Ibf767875af29f4571e7f935d494b44dde002fac6	2020-08-10 18:01:13 +02:00
q3k	03c9a5ed86	app/matrix: add q3k to OWNERS (apparently these don't get inherited?) Change-Id: Ie0052677585863da6dade8c184e25b8c15ddf42c	2020-08-05 23:04:29 +02:00
q3k	fe33aa6489	Merge "third_party/py: bump cffi and psycopg2 to latest versions"	2020-08-05 20:58:12 +00:00
q3k	970b7687f3	factorio: bump all to 0.18.40 Change-Id: Iaf9b28ce6fed9ba791075307ee3e75f218267d23	2020-08-04 20:33:25 +02:00
q3k	3d29484ebb	k0: move registry to ceph-waw3 ceph-waw2 has currently some production issues [1] which have started to cause write failures in the registry. The registry is the only user of ceph-waw2's affected pool, so we reduce the dumpster fire blast radious by moving it over to ceph-waw3. This has already been deployed and data has been migrated over (via s3cmd sync), and the migration has been verified (by a push and pull, and pull of an older image). [1] - pgs stuck inactive in the object storage pool Change-Id: I26789b52008bb7be953954ec3fd3dd727ac15347	2020-08-04 01:36:51 +02:00
q3k	1773f32c8a	factorio: bump to 0.18.40 Change-Id: I065a5e8a8c6608a137c0ae4f1cb04f8254ef6ddd	2020-08-01 22:02:38 +02:00

... 8 9 10 11 12 ...

1008 Commits (20c6bcb7305d4b85c5fd6dfc72c04c68b772d15f) All Branches Search

1008 Commits (20c6bcb7305d4b85c5fd6dfc72c04c68b772d15f)

All Branches