This has been encountered when introducing redis in our production
matrix deployment. /data partition is owned by root:root by default
otherwise.
Change-Id: Ic148ff25837c6e8da394a5124556481343ea2873
This is used by some external modules (appservices/instance
definitions). In order to reduce scope of (untested) changes in this
rollout, let's temporarily backport that function into matrix-ng.
Change-Id: Ib1054844391497ef1455b25c7f939c68c628ff09
matrix-ng split into multiple submodules causes some changes in keys
that might've been used for homeserver/riot configuration customization.
Migration to kube.Namespace.Contain has also caused change in Deployment
selectors (immutable fields), thus needing manual removal of these
first.
This is, as always, documented in lib/matrix-ng.libsonnet header.
Change-Id: I39a745ee27e3c55ec748818b9cf9b4e8ba1d2df5
This turns admitomatic into a self-standing service that can be used as
an admission controller.
I've tested this E2E on a local k3s server, and have some early test
code for that - but that'll land up in a follow up CR, as it first needs
to be cleaned up.
Change-Id: I46da0fc49f9d1a3a1a96700a36deb82e5057249b
This gives us nearly everything required to run the admission
controller. In addition to checking for allowed domains, we also do some
nginx-inress-controller security checks.
Change-Id: Ib187de6d2c06c58bd8c320503d4f850df2ec8abd
This is a major revamp of our matrix/synapse deployment as a separate
.libsonnet module.
* synapse version bump to 1.25.0
* riot-web version bump to 1.7.18
* Replaced synapse migration hack we used to template configuration with
environment variable replacement done by Kubernetes itself
* Implemented support for OpenID Connect, migration from CAS has been
verified to be working with some additional configuration options
* Moved homeserver signing key into k8s secret, thus making it possible
to run synapse processes without a single data volume
* Split synapse into main process, generic worker and media repository
worker. (latter is the only container using data volume) Both generic
worker and media repository worker is running on a single replica, until
we get proper HTTP routing/loadbalancing
* Riot nginx.conf has been extracted into an external file loaded using
importstr.
Change-Id: I6c4d34bf41e148a302d1cbe725608a5aeb7b87ba
This is the beginning of a validating admission controller which we will
use to permit end-users access to manage Ingresses.
This first pass implements an ingressFilter, which is the main structure
through which allowed namespace/dns combinations will be allowed. The
interface is currently via a test, but in the future this will likely be
configured via a command line, or via a serialized protobuf config.
Change-Id: I22dbed633ea8d8e1fa02c2a1598f37f02ea1b309
This option allows easy customization of certain initial database
properties, like encoding or collation. See:
https://www.postgresql.org/docs/9.5/app-initdb.html
Adding this option in already existing deployments will only cause
postgres pod restart, but no data loss or schema changes!
Intended to be used in further matrix deployment cleanups.
Change-Id: I49a017c21a228f983bea6bafa7dac962a75d05c9
Exposes /.well-known/matrix/ metadata endpoints on cfg.webDomain that
are required for federation to work properly. This can be enabled using
cfg.wellKnown flag set to true.
Change-Id: I097b58efc7442b904a135d4519999e36d155c197
This change reflects the current production state.
Upgrade was done by going through following versions:
19.1.0 -> 19.2.12 -> 20.1.10 -> 20.2.4
Change-Id: I8b33b8116363f1a918423fd18ba3d1b5c910851c
It used to be at 128Mi, which is a bit small considering this clones
hscloud into memory.
This is a quick fix, a better thing to do would be to have some storage
for depotview to clone into, instead of serving fully from RAM.
Change-Id: I619d39a0d61f5de9bdeef1f46262c78ea33a19fc
It reached the stage of being crapped out so much that the OSDs spurious
IOPS killed the performance of disks colocated on the same M610 RAID
controllers. This made etcd _very_ slow, to the point of churning
through re-elections due to timeouts.
etcd/apiserver latencies, observe the difference at ~15:38:
https://object.ceph-waw3.hswaw.net/q3k-personal/4fbe8d4cfc8193cad307d487371b4e44358b931a7494aa88aff50b13fae9983c.png
I moved gerrit/* and matrix/appservice-irc-freenode PVCs to ceph-waw3 by
hand. The rest were non-critical so I removed them, they can be
recovered from benji backups if needed.
Change-Id: Iffbe87aefc06d8324a82b958a579143b7dd9914c
More as-builts. This has already been bumped. Had to coax ceph-waw2 to
upgrade despite the fact that it's horribly broken.
Change-Id: Ia762f5d7d88d6420c2fc25cf199037cbccde0cb3
This is after the monster^Wrook outage of the week two weeks ago caused
by bc01n03 dying.
Plan is to migrate ceph-waw3 to be external, yeet ceph-waw2, and extend
crdb-waw1 to another node.
Change-Id: I133af3b1171fea383b45bf06c51e48a5c40341e4
This call will return a stream of repeated Invoices, in order to submit
monthly audit summaries to accounting, including PDFs and JPK_V7 codes
(ie. GTU and SP codes).
Change-Id: Id9da2952a6358c5c2c737eee08c473c1fbcfbe7d
Also drive-by fix two proto issues:
- rename gtu_codes to gtu_code (following convention)
- move denormalized Item.due_date field past denormalized comment.
Change-Id: Ibfe0a21aadc0a5d4e2f784b182e530b9603aae62
per bazel error message:
DEBUG: Rule X indicated that a canonical reproducible form can be obtained by modifying arguments shallow_since = Y
Change-Id: I1c29609197d776536b7bc0336858047d7494d795
This is a basic grafana running on:
https://monitoring-global-dashboard.k0.hswaw.net/
It contains a data source pointing at the corresponding global victoria
metrics. There's no dashboards, these will be provisioned soon via
jsonnet/grafonnet.
Change-Id: I84873bc323d1727096e3ce818fae122a9af3e191