1
0
Fork 0
Commit Graph

992 Commits (9fcce22ef34a9cc20f4837b9b5780db487c2cb8b)

Author SHA1 Message Date
q3k 9fcce22ef3 bgpwtf/oob: fix markup
Change-Id: I8676fb58ea79d9d37989c1afd03543842cb4fa1b
Reviewed-on: https://gerrit.hackerspace.pl/c/hscloud/+/1149
Reviewed-by: q3k <q3k@hackerspace.pl>
2021-09-18 11:45:07 +00:00
informatic 77af94df2f app/matrix: add healthchecks, increase generic workers
Change-Id: I1605919d52c69044963082bbf094ff2ece902471
Reviewed-on: https://gerrit.hackerspace.pl/c/hscloud/+/1147
Reviewed-by: q3k <q3k@hackerspace.pl>
2021-09-16 21:47:39 +00:00
informatic f56db19385 app/matrix: bump synapse do 1.42.0, enable public room browsing
Change-Id: Idf5a2e7bdcff89c0093908b17afc455e2768694b
Reviewed-on: https://gerrit.hackerspace.pl/c/hscloud/+/1146
Reviewed-by: q3k <q3k@hackerspace.pl>
2021-09-16 21:47:39 +00:00
informatic cf3d8481fd app/matrix: upgrade element-web to v1.8.5
riot-web containers are no longer published.

We shall also readjust our internal naming for matrix web client from
riot to something more generic at some point.

Change-Id: Ice85af3ae29b587c13a3ba27d13c9bd655d7fcfd
Reviewed-on: https://gerrit.hackerspace.pl/c/hscloud/+/1145
Reviewed-by: informatic <informatic@hackerspace.pl>
2021-09-16 18:57:08 +00:00
informatic 21c8cd6833 app/matrix/matrix.hackerspace.pl: finish matrix-media-repo rollout
Change-Id: I7acc34c82c8ffe1334bb9201b993a410eb517b63
Reviewed-on: https://gerrit.hackerspace.pl/c/hscloud/+/1144
Reviewed-by: informatic <informatic@hackerspace.pl>
Reviewed-by: q3k <q3k@hackerspace.pl>
2021-09-16 18:57:08 +00:00
q3k ebe6075556 app/matrix: media repo proxy init
This implements media-repo-proxy, a lil' bit of Go to make our
infrastructure work with matrix-media-repo's concept of Host headers.

For some reason, MMR really wants Host: hackerspace.pl instead of Host:
matrix.hackerspace.pl. We'd fix that in their code, but with no tests
and with complex config reload logic it looks very daunting. We'd just
fix that in our Ingress, but that's not easy (no per-rule host
overrides).

So, we commit a tiny little itty bitty war crime and implement a piece
of Go code that serves as a rewriter for this.

This works, tested on boston:

    $ curl -H "Host: matrix.hackerspace.pl" 10.10.12.46:8080/_matrix/media/r0/download/hackerspace.pl/EwVBulPgCWDWNGMKjcOKGGbk | file -
    /dev/stdin: JPEG image data, JFIF standard 1.01, aspect ratio, density 1x1, segment length 16, baseline, precision 8, 650x300, components 3

(this address is media-repo.matrix.svc.k0.hswaw.net)

But hey, at least it has tests.

Change-Id: Ib6af1988fe8e112c9f3a5577506b18b48d80af62
Reviewed-on: https://gerrit.hackerspace.pl/c/hscloud/+/1143
Reviewed-by: q3k <q3k@hackerspace.pl>
2021-09-16 18:57:08 +00:00
informatic 8b9c8f9a03 app/matrix/matrix.hackerspace.pl: deploy matrix-media-repo
Change-Id: If80335595190cf2e22cc2ef5d5f305b70e09d5d7
Reviewed-on: https://gerrit.hackerspace.pl/c/hscloud/+/1142
Reviewed-by: q3k <q3k@hackerspace.pl>
2021-09-16 18:57:08 +00:00
informatic 122d5e5864 app/matrix: matrix-media-repo RGW-based media storage
Change-Id: I459bd78eee52fd349a16f31a48346d3258ef50a4
Reviewed-on: https://gerrit.hackerspace.pl/c/hscloud/+/1081
Reviewed-by: q3k <q3k@hackerspace.pl>
2021-09-15 21:12:34 +00:00
informatic 0e6c6720d9 Merge "app/matrix/matrix.hackerspace.pl: pin synapse media-worker container version" 2021-09-14 20:58:53 +00:00
informatic e839f95079 cluster/kube/k0: add matrix and informatic personal ceph users
Change-Id: Ied8d474709b8053e9fc339435d3ca1ca5fdfa710
2021-09-14 22:21:22 +02:00
informatic 2e191eae7b app/matrix/matrix.hackerspace.pl: pin synapse media-worker container version
We keep this pinned to older version to prevent unneeded media container
restarts.

Change-Id: I221237d3f88720779572fd972e8ada65e829864d
2021-09-14 22:19:44 +02:00
informatic dcb131fdc2 Merge "app/matrix: appservice-irc v0.29.0 upgrade" 2021-09-14 20:19:15 +00:00
informatic 91faf5bc3d Merge "shell.nix: add missing gnupg" 2021-09-14 20:19:07 +00:00
q3k 719ab840c5 Merge changes Ia92c99e1,I4dca55a7,I4ed014d2,I96c3c18b,I08e70425
* changes:
  cluster/kube: always enable flexdriver
  cluster: k0: move ceph-waw3 to proper realm/zonegroup
  cluster/nix: k0: enable rgw on osds
  cluster: k0: upgrade to ceph 16.2.5
  cluster: k0: bump rook to 1.6
2021-09-14 19:53:17 +00:00
q3k 4b8ee32246 cluster/kube: always enable flexdriver
Documentation says [1] this is disabled by default in 1.1, but that
documentation kinda lies [2].

[1] - 235d5a384b/Documentation/flexvolume.md (ceph-flexvolume-configuration)

[2] - 64e28af741 (diff-d1eb5cba50e3770b61ccd3c730cd40514053e1da0233dfe09b5e7967e76a2a6cL424-L425)

Change-Id: Ia92c99e137ed751db62c0f56d42c4901986d0bb8
2021-09-14 21:39:39 +02:00
q3k 38f72fe094 cluster: k0: move ceph-waw3 to proper realm/zonegroup
With this we can use Ceph's multi-site support to easily migrate to our
new k0 Ceph cluster.

This migration was done by using radosgw-admin to rename the existing
realm/zonegroup to the new names (hscloud and eu), and then reworking
the jsonnet so that the Rook operator would effectively do nothing.

It sounds weird that creating a bunch of CRs like
Object{Realm,ZoneGroup,Zone} realm would be a no-op for the operator,
but that's how Rook works - a CephObjectStore generally creates
everything that the above CRs would create too, but implicitly. Adding
the extra CRs just allows specifying extra settings, like names.

(it wasn't fully a no-op, as the rgw daemon is parametrized by
realm/zonegroup/zone names, so that had to be restarted)

We also make the radosgw serve under object.ceph-eu.hswaw.net, which
allows us to right away start using a zonegroup URL instead of the
zone-only URL.

Change-Id: I4dca55a705edb3bd28e54f50982c85720a17b877
2021-09-14 21:39:39 +02:00
q3k 18084c1e86 cluster/nix: k0: enable rgw on osds
This enables radosgw wherever osds are. This should be fast and works
for us because we have little osd hosts.

Change-Id: I4ed014d2790d6c02a2ba8e775aaa1846032dee1e
2021-09-14 21:39:39 +02:00
q3k 085a8ff247 cluster: k0: upgrade to ceph 16.2.5
This was fun. See b/6 for a log of how swimmingly this went.

Change-Id: I96c3c18b5d33ef86523b3506f49a390419e9ca7f
2021-09-14 21:39:39 +02:00
q3k 464fb04f39 cluster: k0: bump rook to 1.6
This is needed to get Rook to talk to an external Ceph 16/Pacific
cluster.

This is mostly a bunch of CRD/RBAC changes. Most notably, we yeet our
own CRD rewrite and just slurp in upstream CRD defs.

Change-Id: I08e7042585722ae4440f97019a5212d6cf733fcc
2021-09-14 21:39:37 +02:00
informatic 0f26c4afbc app/matrix: appservice-irc v0.29.0 upgrade
Change-Id: I5b09b3e861442c0b8579abdbeff8983ab1ec0208
2021-09-14 20:00:42 +02:00
informatic 0c59cb33af shell.nix: add missing gnupg
This should fix secretstore on NixOS

Change-Id: Id86b0e920bef82f08a67a84e59d37d6f8737d83f
2021-09-14 20:00:42 +02:00
informatic 5cc64bf60e Merge "app/matrix: bump synapse to 1.37.1" 2021-09-14 17:51:07 +00:00
informatic 013c159dfe Merge "shell.nix: add missing tools" 2021-09-14 16:43:21 +00:00
informatic cb9cbb3fcc shell.nix: add missing tools
Some tools were taken from "host" shell/PATH which crashed in certain
cases due to libc incompatiblity.

Fixes b/50

Change-Id: Ie94e2c064afff6d5aa782f70e0a024365079e4c7
2021-09-14 18:37:10 +02:00
q3k 92c8dc6532 Merge "kartongips: paper over^W^Wfix CRD updates" 2021-09-12 22:11:11 +00:00
q3k 6c88de9dd7 Merge "cluster/nix: symlink /sbin/lvm" 2021-09-12 22:11:07 +00:00
q3k c793538b58 Merge "cluster: deploy NixOS-based ceph" 2021-09-12 00:56:12 +00:00
q3k 6579e842b0 kartongips: paper over^W^Wfix CRD updates
Ceph CRD updates would fail with:

  ERROR Error updating customresourcedefinitions cephclusters.ceph.rook.io: expected kind, but got map

This wasn't just https://github.com/bitnami/kubecfg/issues/259 . We pull
in the 'solution' from Pulumi
(https://github.com/pulumi/pulumi-kubernetes/pull/622) which just
retries the update via a JSON update instead, and that seems to have
worked.

We also add some better error return wrapping, which I used to debug
this issue properly.

Oof.

Change-Id: I2007a7857e44128d74760174b61b59efa58e9cbc
2021-09-11 20:54:34 +00:00
q3k 9cfc2a0e43 kube.libsonnet: refactor OpenAPI lib, support extra types
This was to be used by a Ceph CRD bump, but we ended up using upstream
yaml instead. But it's a useful change regardless.

I really should document this and write some tests.

Change-Id: I27ce94c6ebe50a4a93baa83418e8d40004755231
2021-09-11 20:49:51 +00:00
q3k 05c4b5515b cluster/nix: symlink /sbin/lvm
This is needed by the new Rook OSD daemons.

Change-Id: I16eb24332db40a8209e7eb9747a81fa852e5cad9
2021-09-11 20:45:45 +00:00
q3k 9848e7e15f cluster: deploy NixOS-based ceph
First pass at a non-rook-managed Ceph cluster. We call it k0 instead of
ceph-waw4, as we pretty much are sure now that we will always have a
one-kube-cluster-to-one-ceph-cluster correspondence, with different Ceph
pools for different media kinds (if at all).

For now this has one mon and spinning rust OSDs. This can be iterated on
to make it less terrible with time.

See b/6 for more details.

Change-Id: Ie502a232c700af93f33fcad9fa1c57058161aa11
2021-09-11 20:33:24 +00:00
q3k 1dbefed537 Merge "cluster/kube: remove ceph diff against k0 production" 2021-09-11 20:32:57 +00:00
q3k 9f639694ba Merge "kartongips: switch default diff behaviour to subset, nag users" 2021-09-11 20:18:34 +00:00
q3k 29f314b620 Merge "kartongips: implement proper diffing of aggregated ClusterRoles" 2021-09-11 20:18:28 +00:00
q3k 4f0468fa26 cluster/kube: remove ceph diff against k0 production
This now has a zero diff against prod.

location fields in CephCluster.storage.nodes seem to have been removed
from the CRD at some point. Not sure how the CRUSH tree now gets
populated, but whatever, it's been working like this for a while
already. Same for CephObjectStore.gateway.type.

The Rook Operator has been zero-scaled for a while now due to b/6.

Change-Id: I30a836f273f4c1529f60fa9297c96b7aac412f59
2021-09-11 12:43:53 +00:00
q3k 59c8149df4 kartongips: switch default diff behaviour to subset, nag users
Change-Id: I998cdf7e693f6d1ce86c7ea411f47320d72a5906
2021-09-11 12:43:50 +00:00
q3k 72d7574536 kartongips: implement proper diffing of aggregated ClusterRoles
For a while now we've had spurious diffs against Ceph on k0 because of
a ClusterRole with an aggregationRule.

The way these behave is that the config object has an empty rule list,
and instead populates an aggregationRule which combines other existing
ClusterRoles into that ClusterRole. The control plane then populates the
rule field when the object is read/acted on, which caused us to always
see a diff between the configuration of that ClusterRole.

This hacks together a hardcoded fix for this particular behaviour.
Porting kubecfg over to SSA would probably also fix this - but that's
too much work for now.

Change-Id: I357c1417d4023691e5809f1af23f58f364353388
2021-09-11 12:40:18 +00:00
q3k d592e6836d Merge "ops, cluster: consolidate NixOS provisioning" 2021-09-11 10:38:43 +00:00
implr 7f7dcd9847 Merge "nix: upgrade readTree" 2021-09-11 10:19:03 +00:00
implr 56ff18c486 nix: upgrade readTree
Change-Id: I460800dc3d8095e2ae89b8bd6ed7c5f0c90b6ccf
2021-09-11 12:18:04 +02:00
q3k b3c6770f8d ops, cluster: consolidate NixOS provisioning
This moves the diff-and-activate logic from cluster/nix/provision.nix
into ops/{provision,machines}.nix that can be used for both cluster
machines and bgpwtf machines.

The provisioning scripts now live per-NixOS-config, and anything under
ops.machines.$fqdn now has a .passthru.hscloud.provision derivation
which is that script. When ran, it will attempt to deploy onto the
target machine.

There's also a top-level tool at `ops.provision` which builds all
configurations / machines and can be called with the machine name/fqdn
to call the corresponding provisioner script.

clustercfg is changed to use the new provisioning logic.

Change-Id: I258abce9e8e3db42af35af102f32ab7963046353
2021-09-10 23:55:52 +00:00
q3k 69ff6038d5 shell.nix: colorful prompt
https://object.ceph-waw3.hswaw.net/q3k-personal/815968ff10071d4192e464c91b64228e760128267311a94872006d87cbfd0bd9.png

Change-Id: Ia4eeddf045af0d0bdc962087aaeed55d11846648
2021-09-10 23:15:38 +00:00
q3k eed9afe210 Merge "bgpwtf: edge01: fix ipv4 static routing for customers" 2021-09-10 22:45:41 +00:00
arsenicum aef13358c8 personal - start
Change-Id: I0f1972a095b5a41cad727dbc37fcd454d308050d
2021-09-09 18:26:33 +02:00
q3k 81e7fbaadd bgpwtf: edge01: fix ipv4 static routing for customers
Change-Id: I9c34d12a7947c9bb25331e38ea7ee03beede7e47
2021-09-08 23:40:29 +02:00
q3k 11248d88ab bgpwtf: edge01: add new client networks, remove old q3k network, limit nscd
Batch of small changes. Already deployed.

Change-Id: Ieb4f418699f497c7013e617fd7d1827e71a7a415
2021-09-06 12:07:42 +00:00
q3k 0f11b3c850 hswaw/site: deploy
Change-Id: I3c8aff05f339f3154cb80831099482f0d97a360e
2021-09-04 21:32:30 +02:00
q3k 62e50da881 Merge "tweak blink animation & add gallery" 2021-09-04 18:41:07 +00:00
q3k 5001851808 Merge "hswaw/site: fix twitter link" 2021-09-04 18:40:50 +00:00
q3k d0c9c414cf hswaw/site: deploy
Change-Id: I2ea68f07c81859ffea99ad5b107b14876422288b
2021-09-04 18:38:42 +00:00