1
0
Fork 0
Commit Graph

705 Commits (9e3ca9c84108453dd958b365eaf56a797832a6bb)

Author SHA1 Message Date
q3k 9e3ca9c841 ops/sso: move jsonnets to kube/
This is in preparation for moving the sso source code into hscloud.

Change-Id: I4325df617dc82c17fb4c96762743f0b70122976f
2021-01-31 15:52:06 +01:00
q3k 2fbd0710f5 Merge changes I46da0fc4,Ib187de6d
* changes:
  cluster/admitomatic: finish up service
  cluster/admitomatic: finish up ingress admission logic
2021-01-31 11:56:34 +00:00
q3k c6118649ab cluster/admitomatic: finish up service
This turns admitomatic into a self-standing service that can be used as
an admission controller.

I've tested this E2E on a local k3s server, and have some early test
code for that - but that'll land up in a follow up CR, as it first needs
to be cleaned up.

Change-Id: I46da0fc49f9d1a3a1a96700a36deb82e5057249b
2021-01-31 12:18:16 +01:00
q3k 5d2c8fcda0 cluster/admitomatic: finish up ingress admission logic
This gives us nearly everything required to run the admission
controller. In addition to checking for allowed domains, we also do some
nginx-inress-controller security checks.

Change-Id: Ib187de6d2c06c58bd8c320503d4f850df2ec8abd
2021-01-31 12:18:16 +01:00
informatic 0c75256f48 Merge "app/matrix: matrix-ng - synapse deployment cleanup" 2021-01-30 20:58:55 +00:00
q3k 857903b6c6 Merge "cluster/admitomatic: implement basic dns/ns filtering" 2021-01-30 20:39:47 +00:00
q3k 190feb37b0 .bazelrc: switch over to PY3 (rules_docker is now fully PY3 compliant)
Change-Id: I53edb8eae81779d5b8cea36e3bec4c05ca2c6e0d
2021-01-30 20:30:48 +00:00
informatic 8ec865728e app/matrix: matrix-ng - synapse deployment cleanup
This is a major revamp of our matrix/synapse deployment as a separate
.libsonnet module.

* synapse version bump to 1.25.0
* riot-web version bump to 1.7.18
* Replaced synapse migration hack we used to template configuration with
environment variable replacement done by Kubernetes itself
* Implemented support for OpenID Connect, migration from CAS has been
verified to be working with some additional configuration options
* Moved homeserver signing key into k8s secret, thus making it possible
to run synapse processes without a single data volume
* Split synapse into main process, generic worker and media repository
worker. (latter is the only container using data volume) Both generic
worker and media repository worker is running on a single replica, until
we get proper HTTP routing/loadbalancing
* Riot nginx.conf has been extracted into an external file loaded using
importstr.

Change-Id: I6c4d34bf41e148a302d1cbe725608a5aeb7b87ba
2021-01-30 21:18:51 +01:00
q3k 649565324b cluster/admitomatic: implement basic dns/ns filtering
This is the beginning of a validating admission controller which we will
use to permit end-users access to manage Ingresses.

This first pass implements an ingressFilter, which is the main structure
through which allowed namespace/dns combinations will be allowed. The
interface is currently via a test, but in the future this will likely be
configured via a command line, or via a serialized protobuf config.

Change-Id: I22dbed633ea8d8e1fa02c2a1598f37f02ea1b309
2021-01-30 19:19:35 +01:00
q3k cc2ff79f01 ops/monitoring: move grafana to sso.
Change-Id: Ib2ecf6820454a160834db2ac212b31d9d5306972
2021-01-30 17:26:47 +01:00
q3k d82807e024 Merge changes I84873bc3,I1eedb190
* changes:
  ops/monitoring: deploy grafana
  ops/monitoring: scrape apiserver, scheduler, and controller-manager
2021-01-30 16:22:44 +00:00
informatic aadb47b3c5 Merge "ops/sso: "the hackerspace oidc/oauth2 provider" deployment" 2021-01-30 16:21:45 +00:00
informatic bd36d96efb Merge "kube/postgres: expose cfg.initdbArgs" 2021-01-30 15:34:35 +00:00
informatic 77351a68c7 Merge changes Ic71cbdce,I097b58ef
* changes:
  app/matrix: cleanup irc bridge registration oneliner
  app/matrix: add wellKnown server integration
2021-01-30 15:34:25 +00:00
informatic 1816f58448 kube/postgres: expose cfg.initdbArgs
This option allows easy customization of certain initial database
properties, like encoding or collation. See:
https://www.postgresql.org/docs/9.5/app-initdb.html

Adding this option in already existing deployments will only cause
postgres pod restart, but no data loss or schema changes!

Intended to be used in further matrix deployment cleanups.

Change-Id: I49a017c21a228f983bea6bafa7dac962a75d05c9
2021-01-30 13:14:37 +01:00
informatic ee62857c70 app/matrix: cleanup irc bridge registration oneliner
Change-Id: Ic71cbdce6bd9668754285f863fd987c63ab5386d
2021-01-30 13:10:22 +01:00
informatic 63244ca465 app/matrix: add wellKnown server integration
Exposes /.well-known/matrix/ metadata endpoints on cfg.webDomain that
are required for federation to work properly. This can be enabled using
cfg.wellKnown flag set to true.

Change-Id: I097b58efc7442b904a135d4519999e36d155c197
2021-01-30 13:10:15 +01:00
q3k 8506af2c24 app/matrix/wellknown: push container
Change-Id: Ifc8fec94cdfd7c98b5c87c1c20167b34608e1eea
2021-01-29 22:55:32 +00:00
informatic d6c97596cd ops/sso: "the hackerspace oidc/oauth2 provider" deployment
Change-Id: I092b844364ed30037eff00188dcdf5d6d3c228c5
2021-01-29 23:23:09 +01:00
patryk edf14cc5f4 crdb: replace bc01n03 with dcr01s22, upgrade to v20.2.4
This change reflects the current production state.

Upgrade was done by going through following versions:
19.1.0 -> 19.2.12 -> 20.1.10 -> 20.2.4

Change-Id: I8b33b8116363f1a918423fd18ba3d1b5c910851c
2021-01-23 23:00:29 +01:00
patryk f3153888a8 cluster/kube: Add k0-cockroach.jsonnet, add Gitea client cert
Change-Id: Ibc5db1b0114b2540b6dc806e75e9a36cf9a3bc50
2021-01-23 15:38:50 +01:00
q3k adbf560851 devtools: bump up depotview mem limit
It used to be at 128Mi, which is a bit small considering this clones
hscloud into memory.

This is a quick fix, a better thing to do would be to have some storage
for depotview to clone into, instead of serving fully from RAM.

Change-Id: I619d39a0d61f5de9bdeef1f46262c78ea33a19fc
2021-01-22 18:38:43 +01:00
q3k 61f978a0a0 *: tear down ceph-waw2
It reached the stage of being crapped out so much that the OSDs spurious
IOPS killed the performance of disks colocated on the same M610 RAID
controllers. This made etcd _very_ slow, to the point of churning
through re-elections due to timeouts.

etcd/apiserver latencies, observe the difference at ~15:38:

https://object.ceph-waw3.hswaw.net/q3k-personal/4fbe8d4cfc8193cad307d487371b4e44358b931a7494aa88aff50b13fae9983c.png

I moved gerrit/* and matrix/appservice-irc-freenode PVCs to ceph-waw3 by
hand. The rest were non-critical so I removed them, they can be
recovered from benji backups if needed.

Change-Id: Iffbe87aefc06d8324a82b958a579143b7dd9914c
2021-01-22 16:26:09 +01:00
q3k 856b284e29 bgpwtf: edge01: add radio rooftop circuit
Change-Id: I07d6f3cb9170e1b8c5c2d8ea429d847ffa87126c
2021-01-21 20:47:42 +00:00
q3k 3b9ee5f1c0 ceph: bump to 14.2.16
More as-builts. This has already been bumped. Had to coax ceph-waw2 to
upgrade despite the fact that it's horribly broken.

Change-Id: Ia762f5d7d88d6420c2fc25cf199037cbccde0cb3
2021-01-19 21:45:26 +00:00
q3k 2c04c8410a rook: bump to 1.2.7
As-built: deployed to ceph-waw{2,3} already.

Change-Id: I27189b273cf72638cf2036681054832db99591da
2021-01-19 21:41:13 +01:00
q3k f684535c6e k0: remove bc01n03 from nix defs
This only affects ETCD_INITIAL_* env vars, so is is effectively a no-op.

Deployed to prod.

Change-Id: Ic9118e17b088d1b58ebaf1ac0708a1ee6fcf2c06
2021-01-19 20:20:33 +01:00
q3k cf842b0442 k0: reflect reality
This is after the monster^Wrook outage of the week two weeks ago caused
by bc01n03 dying.

Plan is to migrate ceph-waw3 to be external, yeet ceph-waw2, and extend
crdb-waw1 to another node.

Change-Id: I133af3b1171fea383b45bf06c51e48a5c40341e4
2021-01-19 20:08:26 +01:00
q3k f70b1be78b Merge "invoice: bump year for new databases" 2021-01-19 18:59:41 +00:00
q3k d9670d739b invoice: bump year for new databases
Change-Id: I88918b103e7b128d5fc263873ce9d2ec9a739bd7
2021-01-19 19:59:09 +01:00
q3k 1ecf22da9a invoice: add GetInvoices to proto
This call will return a stream of repeated Invoices, in order to submit
monthly audit summaries to accounting, including PDFs and JPK_V7 codes
(ie. GTU and SP codes).

Change-Id: Id9da2952a6358c5c2c737eee08c473c1fbcfbe7d
2021-01-09 21:59:04 +00:00
q3k b456c18bb2 invoice: calculate GTU codes for invoice, implement some tests
Also drive-by fix two proto issues:
 - rename gtu_codes to gtu_code (following convention)
 - move denormalized Item.due_date field past denormalized comment.

Change-Id: Ibfe0a21aadc0a5d4e2f784b182e530b9603aae62
2021-01-09 21:58:59 +00:00
q3k d67635d338 Bump riot-web on matrix.hackerspace.pl
Change-Id: Ia043a03afb85b1a149b112a2be5c29fb26d5969d
2020-12-29 22:27:32 +00:00
implr 0e2057fba9 make WORKSPACE rules reproducible
per bazel error message:
DEBUG: Rule X indicated that a canonical reproducible form can be obtained by modifying arguments shallow_since = Y

Change-Id: I1c29609197d776536b7bc0336858047d7494d795
2020-12-28 21:43:33 +01:00
implr 67c86188d7 bgpwtf/edge01: as-deployed: add qemu-bridge-helper config to fix anchorvm
Change-Id: I305c498f8332de8addac435da57ba88e1b34c7f0
2020-12-21 15:14:13 +01:00
q3k 882cd7ba81 Merge "gerrit: deploy 3.3.0" 2020-12-17 22:36:43 +00:00
q3k faa326a37d WORKSPACE: update for new gerrit
Forgot to commit in https://gerrit.hackerspace.pl/c/hscloud/+/581 .
Whoops.

Change-Id: I9605b07079e4d1a9c916e6106034f3dba98964c2
2020-12-17 22:33:32 +00:00
q3k ee2f8a37d5 gerrit: deploy 3.3.0
Change-Id: Ib48f2df4b7fd424a6a33d928d60a1a4c92c43c30
2020-12-17 23:32:30 +01:00
implr 6327f12afa Merge "edge01: systemd unit for running RIPE Atlas anchor VM" 2020-12-17 22:19:37 +00:00
q3k 4f7caf8d86 ops/monitoring: deploy grafana
This is a basic grafana running on:

    https://monitoring-global-dashboard.k0.hswaw.net/

It contains a data source pointing at the corresponding global victoria
metrics. There's no dashboards, these will be provisioned soon via
jsonnet/grafonnet.

Change-Id: I84873bc323d1727096e3ce818fae122a9af3e191
2020-12-17 22:10:31 +00:00
q3k cfc0496266 ops/monitoring: scrape apiserver, scheduler, and controller-manager
These get scraped by public IP address, which get retrieved via service
discovery in Prometheus (by using the endpoints role on the
default/kubernetes service).

Also drive-by fix cluster prometheus resources - the default
configuration wants at least 3GB of physical memory.

Change-Id: I1eedb19051f62b40613f69e5f0f736d5958acf42
2020-12-17 22:09:56 +00:00
q3k 70c60feea6 gerrit-oauth-provider: port Warsaw Hackerspace plugin to new API
Change-Id: Ia1260e3ebf14e410ffd94c0e74113a5bae568157
2020-12-17 23:06:02 +01:00
q3k bfa4a65f76 gerrit-oauth-provider: bump
This now tracks upstream's master at 296a0051e1692da91a9b0d3c9b878ac571dc9819

Change-Id: Id08e3a43bcabc3bc4f6341dd5973025e53e02e84
2020-12-17 20:55:28 +01:00
q3k 9708ba02ec Merge "cluster: use static addresses" 2020-12-15 18:53:54 +00:00
implr c726798ef7 edge01: systemd unit for running RIPE Atlas anchor VM
Change-Id: I5d91c3b3075c404af92d40f33a48a487b84ec7a5
2020-12-15 07:05:12 +01:00
q3k acdd665b08 cluster: use static addresses
This disables DHCP on all k0 nodes. This change has been tentatively
deployed to bc01n01 (which is cordoned off in kube), and I will deploy
it to the rest of k0 machines once merged.

Change-Id: I96253a9d0acedb4512c877c64174992ffdb43d58
2020-12-14 19:10:52 +01:00
implr 76de8f860d enable coredumpctl on edge01
Change-Id: Ibed8b4e9f453019e8857ef4e070d7efbcb1f13d4
2020-12-10 08:30:38 +01:00
q3k fc947c5ba3 Merge "minecraft: bump paper to 1.16.4" 2020-12-06 17:58:23 +00:00
q3k 9173333e6c minecraft: bump paper to 1.16.4
Change-Id: I73e799440df07de4bb1bdd31c01d07f4db0f1e2f
2020-12-06 18:57:51 +01:00
q3k da3fc08465 Merge "wow: implement spaceapi" 2020-12-04 09:49:00 +00:00