This is used by some external modules (appservices/instance
definitions). In order to reduce scope of (untested) changes in this
rollout, let's temporarily backport that function into matrix-ng.
Change-Id: Ib1054844391497ef1455b25c7f939c68c628ff09
matrix-ng split into multiple submodules causes some changes in keys
that might've been used for homeserver/riot configuration customization.
Migration to kube.Namespace.Contain has also caused change in Deployment
selectors (immutable fields), thus needing manual removal of these
first.
This is, as always, documented in lib/matrix-ng.libsonnet header.
Change-Id: I39a745ee27e3c55ec748818b9cf9b4e8ba1d2df5
This is a major revamp of our matrix/synapse deployment as a separate
.libsonnet module.
* synapse version bump to 1.25.0
* riot-web version bump to 1.7.18
* Replaced synapse migration hack we used to template configuration with
environment variable replacement done by Kubernetes itself
* Implemented support for OpenID Connect, migration from CAS has been
verified to be working with some additional configuration options
* Moved homeserver signing key into k8s secret, thus making it possible
to run synapse processes without a single data volume
* Split synapse into main process, generic worker and media repository
worker. (latter is the only container using data volume) Both generic
worker and media repository worker is running on a single replica, until
we get proper HTTP routing/loadbalancing
* Riot nginx.conf has been extracted into an external file loaded using
importstr.
Change-Id: I6c4d34bf41e148a302d1cbe725608a5aeb7b87ba
Exposes /.well-known/matrix/ metadata endpoints on cfg.webDomain that
are required for federation to work properly. This can be enabled using
cfg.wellKnown flag set to true.
Change-Id: I097b58efc7442b904a135d4519999e36d155c197
It reached the stage of being crapped out so much that the OSDs spurious
IOPS killed the performance of disks colocated on the same M610 RAID
controllers. This made etcd _very_ slow, to the point of churning
through re-elections due to timeouts.
etcd/apiserver latencies, observe the difference at ~15:38:
https://object.ceph-waw3.hswaw.net/q3k-personal/4fbe8d4cfc8193cad307d487371b4e44358b931a7494aa88aff50b13fae9983c.png
I moved gerrit/* and matrix/appservice-irc-freenode PVCs to ceph-waw3 by
hand. The rest were non-critical so I removed them, they can be
recovered from benji backups if needed.
Change-Id: Iffbe87aefc06d8324a82b958a579143b7dd9914c
This is in preparation for spinning up a staging/QA matrix instance,
where the MXID domain is under control by hscloud machinery (and not a
top-level organizational domain).
Change-Id: I10505615ebb407b3b2eac0c1b87ad5625e2009c0
This is in prepration for bringing up a Matrix server for hsp.sh.
Verified to cause no diff on prod.
Change-Id: Ied2de210692e3ddfdb1d3f37b12893b214c34b0b
This deploys office.hackerspace.pl. It's a collaborative document
editing server that works with Nextcloud.
This is already live, and can be tested with owncloud.hackerspace.pl
(new -> document).
Change-Id: Ic8055a8a6679e7a0695ebb9e41108074d8f789af
WHITE
WHALE
HOLY
GRAIL
Complex systems are complex. Let me tell you a story about that.
Matrix clients perform their last stage of login by performing a POST to
/_matrix/client/r0/login on the Matrix homeserver they log in to. How
they reach the Homeserver is specified earlier - either by using
discovery via SRV or .well-known, or by the client manually specifying
the Matrix homeserver URL.
Regardless of how they reach this endpoint in the first place, this POST
endpoint, as per the Matrix Client-Server API Specification (r0.6.1),
MAY return a `well_known` key, which MUST contain a `homeserver`
address, pointing to the address of the homeserver which the client
should talk to. If present, the client SHOULD use that instead of
whatever it connected to so far.
Issue the first: the iOS client requires `well_known` in that response,
and doesn't work otherwise. https://github.com/vector-im/element-ios/issues/3448
Issue the second: Synapse will return `well_known` accordingly, but only
if `public_baseurl` is set in its configuration. It is not required to
be set. If not set, it will simply not return this key.
Shrek the third: we never set `public_baseurl` in Synapse, and the first
issue (iOS needing `well_known`) only became a regression in
https://github.com/vector-im/element-ios/issues/2715 . As such, it was
difficult to troubleshoot this issue, and we kept getting on some red
herrings: is it the SSO? Is our server broken? Is the iOS implementation
broken?
But now we know - https://github.com/vector-im/element-ios/issues/2715
seems to be the true culprit.
Change-Id: I913792e31e3c6813d4e51d4befdba720cad3f532
Configuring this one is a bit different from appservice-irc. Notably,
there's no way to give it a registration.yaml to overlay on top of a
config, se we end up using an init container with yq to do that for us.
Also, I had to manually copy the regsitration.yaml in synapse, from
/appservices/telegram-prod/registration.yaml to
/data/appservices/telegram-prod.jsonnet, in order to make it work with
the synapse docker start magic. :/
Otherwise, this is deployed and seems to be working.
Change-Id: Id747a0e310221855556c1d280439376f0c4e5ed6
This is in preparation for adding a Telegram bridge appservice. The main
jsonnet file was getting quite chonky.
This does not affect production, and is just a refactor.
Change-Id: I7cdee2bd71aedb40a9f6c3e5148f829023171dcb
The way this was migrated is not to be spoken of.
(hint: it involved downtime, and mounting two volumes at once)
appservice-irc has some storage, we should migrate that to waw3, too. But
it's not as critical.
The new storage (waw3) is _much_ faster.
Change-Id: I4b4bd32e4fedc514753d25bac35d001e8a9c5f00
When deploying https://gerrit.hackerspace.pl/c/hscloud/+/401 we manually
re-pinned appservice-irc to run on bc01n03 (to prevent reschedule as
bc01n02 was updated while bc01n03 was already done). This change makes
git reflect production.
Change-Id: I2518a8a227bfacefd9f1905ded5a1d65e379845f
This has already been bumped in production, and this change makes it
reflect that.
This was supposed to fix iOS sign-in, but that didn't seem to have
worked.
Change-Id: I9278490e40b332a8439fdf1361f27df770b8cd9e
At some point someone bumped appservice-irc to 0.17.1 without commiting
this to git. This fixes that, and also drive-by refactors the
appservice-irc image version to live next to all the other version
strings.
`kubecfg diff --diff-strategy=subset prod.jsonnet` now shows no diff.
Change-Id: I90a64d05cc72669de41fa68195672adca2eb37e8