1
0
Fork 0
Commit Graph

1050 Commits (1da87e52095118febbae6d0e0e014fc296eacb72)

Author SHA1 Message Date
q3k 3b67afe81b cluster/certs: refresh
Change-Id: I2aa8fead4427b917afa4758ea0078125d9c4e914
Reviewed-on: https://gerrit.hackerspace.pl/c/hscloud/+/1153
Reviewed-by: q3k <q3k@hackerspace.pl>
2021-10-07 19:58:35 +00:00
q3k a5b0c13228 edge01: deploy kkc wireguard tunnel (never used)
Change-Id: I5f61f00029ac9e86cd4fdcc390d16ec7fa081f51
Reviewed-on: https://gerrit.hackerspace.pl/c/hscloud/+/1157
Reviewed-by: q3k <q3k@hackerspace.pl>
2021-10-07 18:50:51 +00:00
q3k 848db46bc0 m6220-proxy: make cli iface into library
Change-Id: Ieededb08a930d7b862575cc569d467cdd93e3e0d
Reviewed-on: https://gerrit.hackerspace.pl/c/hscloud/+/1156
Reviewed-by: q3k <q3k@hackerspace.pl>
2021-10-07 18:50:27 +00:00
q3k 3943744814 WORKSPACE: reformat, add novnc
Change-Id: I0162f3a704967cac4c20ec23f962a9be5c210490
Reviewed-on: https://gerrit.hackerspace.pl/c/hscloud/+/1155
Reviewed-by: q3k <q3k@hackerspace.pl>
2021-10-07 18:50:27 +00:00
q3k c429b5385a third_party/go: bump go-netbox
Change-Id: If88259dc10529b45d108c61f1ebfa097844b5bc6
Reviewed-on: https://gerrit.hackerspace.pl/c/hscloud/+/1154
Reviewed-by: q3k <q3k@hackerspace.pl>
2021-10-07 18:50:27 +00:00
noisersup ea3d34354c testing markdown
Change-Id: I143c04b14d2749dca71278999cd10e13ad2fd355
Reviewed-on: https://gerrit.hackerspace.pl/c/hscloud/+/1152
2021-09-28 15:08:48 +00:00
noisersup b83779a499 Best server
Change-Id: I3da422644b3eb49d23d94f4ea719e2d0c2b0fb3d
Reviewed-on: https://gerrit.hackerspace.pl/c/hscloud/+/1151
2021-09-28 15:06:47 +00:00
informatic 94b080d375 devtools/hackdoc: fixup rendering on mobile
Change-Id: If587defdc0bf1d7c5491c328803289b9e75ba918
Reviewed-on: https://gerrit.hackerspace.pl/c/hscloud/+/1148
Reviewed-by: q3k <q3k@hackerspace.pl>
2021-09-18 20:23:34 +00:00
q3k 9fcce22ef3 bgpwtf/oob: fix markup
Change-Id: I8676fb58ea79d9d37989c1afd03543842cb4fa1b
Reviewed-on: https://gerrit.hackerspace.pl/c/hscloud/+/1149
Reviewed-by: q3k <q3k@hackerspace.pl>
2021-09-18 11:45:07 +00:00
informatic 77af94df2f app/matrix: add healthchecks, increase generic workers
Change-Id: I1605919d52c69044963082bbf094ff2ece902471
Reviewed-on: https://gerrit.hackerspace.pl/c/hscloud/+/1147
Reviewed-by: q3k <q3k@hackerspace.pl>
2021-09-16 21:47:39 +00:00
informatic f56db19385 app/matrix: bump synapse do 1.42.0, enable public room browsing
Change-Id: Idf5a2e7bdcff89c0093908b17afc455e2768694b
Reviewed-on: https://gerrit.hackerspace.pl/c/hscloud/+/1146
Reviewed-by: q3k <q3k@hackerspace.pl>
2021-09-16 21:47:39 +00:00
informatic cf3d8481fd app/matrix: upgrade element-web to v1.8.5
riot-web containers are no longer published.

We shall also readjust our internal naming for matrix web client from
riot to something more generic at some point.

Change-Id: Ice85af3ae29b587c13a3ba27d13c9bd655d7fcfd
Reviewed-on: https://gerrit.hackerspace.pl/c/hscloud/+/1145
Reviewed-by: informatic <informatic@hackerspace.pl>
2021-09-16 18:57:08 +00:00
informatic 21c8cd6833 app/matrix/matrix.hackerspace.pl: finish matrix-media-repo rollout
Change-Id: I7acc34c82c8ffe1334bb9201b993a410eb517b63
Reviewed-on: https://gerrit.hackerspace.pl/c/hscloud/+/1144
Reviewed-by: informatic <informatic@hackerspace.pl>
Reviewed-by: q3k <q3k@hackerspace.pl>
2021-09-16 18:57:08 +00:00
q3k ebe6075556 app/matrix: media repo proxy init
This implements media-repo-proxy, a lil' bit of Go to make our
infrastructure work with matrix-media-repo's concept of Host headers.

For some reason, MMR really wants Host: hackerspace.pl instead of Host:
matrix.hackerspace.pl. We'd fix that in their code, but with no tests
and with complex config reload logic it looks very daunting. We'd just
fix that in our Ingress, but that's not easy (no per-rule host
overrides).

So, we commit a tiny little itty bitty war crime and implement a piece
of Go code that serves as a rewriter for this.

This works, tested on boston:

    $ curl -H "Host: matrix.hackerspace.pl" 10.10.12.46:8080/_matrix/media/r0/download/hackerspace.pl/EwVBulPgCWDWNGMKjcOKGGbk | file -
    /dev/stdin: JPEG image data, JFIF standard 1.01, aspect ratio, density 1x1, segment length 16, baseline, precision 8, 650x300, components 3

(this address is media-repo.matrix.svc.k0.hswaw.net)

But hey, at least it has tests.

Change-Id: Ib6af1988fe8e112c9f3a5577506b18b48d80af62
Reviewed-on: https://gerrit.hackerspace.pl/c/hscloud/+/1143
Reviewed-by: q3k <q3k@hackerspace.pl>
2021-09-16 18:57:08 +00:00
informatic 8b9c8f9a03 app/matrix/matrix.hackerspace.pl: deploy matrix-media-repo
Change-Id: If80335595190cf2e22cc2ef5d5f305b70e09d5d7
Reviewed-on: https://gerrit.hackerspace.pl/c/hscloud/+/1142
Reviewed-by: q3k <q3k@hackerspace.pl>
2021-09-16 18:57:08 +00:00
informatic 122d5e5864 app/matrix: matrix-media-repo RGW-based media storage
Change-Id: I459bd78eee52fd349a16f31a48346d3258ef50a4
Reviewed-on: https://gerrit.hackerspace.pl/c/hscloud/+/1081
Reviewed-by: q3k <q3k@hackerspace.pl>
2021-09-15 21:12:34 +00:00
informatic 0e6c6720d9 Merge "app/matrix/matrix.hackerspace.pl: pin synapse media-worker container version" 2021-09-14 20:58:53 +00:00
informatic e839f95079 cluster/kube/k0: add matrix and informatic personal ceph users
Change-Id: Ied8d474709b8053e9fc339435d3ca1ca5fdfa710
2021-09-14 22:21:22 +02:00
informatic 2e191eae7b app/matrix/matrix.hackerspace.pl: pin synapse media-worker container version
We keep this pinned to older version to prevent unneeded media container
restarts.

Change-Id: I221237d3f88720779572fd972e8ada65e829864d
2021-09-14 22:19:44 +02:00
informatic dcb131fdc2 Merge "app/matrix: appservice-irc v0.29.0 upgrade" 2021-09-14 20:19:15 +00:00
informatic 91faf5bc3d Merge "shell.nix: add missing gnupg" 2021-09-14 20:19:07 +00:00
q3k 719ab840c5 Merge changes Ia92c99e1,I4dca55a7,I4ed014d2,I96c3c18b,I08e70425
* changes:
  cluster/kube: always enable flexdriver
  cluster: k0: move ceph-waw3 to proper realm/zonegroup
  cluster/nix: k0: enable rgw on osds
  cluster: k0: upgrade to ceph 16.2.5
  cluster: k0: bump rook to 1.6
2021-09-14 19:53:17 +00:00
q3k 4b8ee32246 cluster/kube: always enable flexdriver
Documentation says [1] this is disabled by default in 1.1, but that
documentation kinda lies [2].

[1] - 235d5a384b/Documentation/flexvolume.md (ceph-flexvolume-configuration)

[2] - 64e28af741 (diff-d1eb5cba50e3770b61ccd3c730cd40514053e1da0233dfe09b5e7967e76a2a6cL424-L425)

Change-Id: Ia92c99e137ed751db62c0f56d42c4901986d0bb8
2021-09-14 21:39:39 +02:00
q3k 38f72fe094 cluster: k0: move ceph-waw3 to proper realm/zonegroup
With this we can use Ceph's multi-site support to easily migrate to our
new k0 Ceph cluster.

This migration was done by using radosgw-admin to rename the existing
realm/zonegroup to the new names (hscloud and eu), and then reworking
the jsonnet so that the Rook operator would effectively do nothing.

It sounds weird that creating a bunch of CRs like
Object{Realm,ZoneGroup,Zone} realm would be a no-op for the operator,
but that's how Rook works - a CephObjectStore generally creates
everything that the above CRs would create too, but implicitly. Adding
the extra CRs just allows specifying extra settings, like names.

(it wasn't fully a no-op, as the rgw daemon is parametrized by
realm/zonegroup/zone names, so that had to be restarted)

We also make the radosgw serve under object.ceph-eu.hswaw.net, which
allows us to right away start using a zonegroup URL instead of the
zone-only URL.

Change-Id: I4dca55a705edb3bd28e54f50982c85720a17b877
2021-09-14 21:39:39 +02:00
q3k 18084c1e86 cluster/nix: k0: enable rgw on osds
This enables radosgw wherever osds are. This should be fast and works
for us because we have little osd hosts.

Change-Id: I4ed014d2790d6c02a2ba8e775aaa1846032dee1e
2021-09-14 21:39:39 +02:00
q3k 085a8ff247 cluster: k0: upgrade to ceph 16.2.5
This was fun. See b/6 for a log of how swimmingly this went.

Change-Id: I96c3c18b5d33ef86523b3506f49a390419e9ca7f
2021-09-14 21:39:39 +02:00
q3k 464fb04f39 cluster: k0: bump rook to 1.6
This is needed to get Rook to talk to an external Ceph 16/Pacific
cluster.

This is mostly a bunch of CRD/RBAC changes. Most notably, we yeet our
own CRD rewrite and just slurp in upstream CRD defs.

Change-Id: I08e7042585722ae4440f97019a5212d6cf733fcc
2021-09-14 21:39:37 +02:00
informatic 0f26c4afbc app/matrix: appservice-irc v0.29.0 upgrade
Change-Id: I5b09b3e861442c0b8579abdbeff8983ab1ec0208
2021-09-14 20:00:42 +02:00
informatic 0c59cb33af shell.nix: add missing gnupg
This should fix secretstore on NixOS

Change-Id: Id86b0e920bef82f08a67a84e59d37d6f8737d83f
2021-09-14 20:00:42 +02:00
informatic 5cc64bf60e Merge "app/matrix: bump synapse to 1.37.1" 2021-09-14 17:51:07 +00:00
informatic 013c159dfe Merge "shell.nix: add missing tools" 2021-09-14 16:43:21 +00:00
informatic cb9cbb3fcc shell.nix: add missing tools
Some tools were taken from "host" shell/PATH which crashed in certain
cases due to libc incompatiblity.

Fixes b/50

Change-Id: Ie94e2c064afff6d5aa782f70e0a024365079e4c7
2021-09-14 18:37:10 +02:00
q3k 92c8dc6532 Merge "kartongips: paper over^W^Wfix CRD updates" 2021-09-12 22:11:11 +00:00
q3k 6c88de9dd7 Merge "cluster/nix: symlink /sbin/lvm" 2021-09-12 22:11:07 +00:00
q3k c793538b58 Merge "cluster: deploy NixOS-based ceph" 2021-09-12 00:56:12 +00:00
q3k 6579e842b0 kartongips: paper over^W^Wfix CRD updates
Ceph CRD updates would fail with:

  ERROR Error updating customresourcedefinitions cephclusters.ceph.rook.io: expected kind, but got map

This wasn't just https://github.com/bitnami/kubecfg/issues/259 . We pull
in the 'solution' from Pulumi
(https://github.com/pulumi/pulumi-kubernetes/pull/622) which just
retries the update via a JSON update instead, and that seems to have
worked.

We also add some better error return wrapping, which I used to debug
this issue properly.

Oof.

Change-Id: I2007a7857e44128d74760174b61b59efa58e9cbc
2021-09-11 20:54:34 +00:00
q3k 9cfc2a0e43 kube.libsonnet: refactor OpenAPI lib, support extra types
This was to be used by a Ceph CRD bump, but we ended up using upstream
yaml instead. But it's a useful change regardless.

I really should document this and write some tests.

Change-Id: I27ce94c6ebe50a4a93baa83418e8d40004755231
2021-09-11 20:49:51 +00:00
q3k 05c4b5515b cluster/nix: symlink /sbin/lvm
This is needed by the new Rook OSD daemons.

Change-Id: I16eb24332db40a8209e7eb9747a81fa852e5cad9
2021-09-11 20:45:45 +00:00
q3k 9848e7e15f cluster: deploy NixOS-based ceph
First pass at a non-rook-managed Ceph cluster. We call it k0 instead of
ceph-waw4, as we pretty much are sure now that we will always have a
one-kube-cluster-to-one-ceph-cluster correspondence, with different Ceph
pools for different media kinds (if at all).

For now this has one mon and spinning rust OSDs. This can be iterated on
to make it less terrible with time.

See b/6 for more details.

Change-Id: Ie502a232c700af93f33fcad9fa1c57058161aa11
2021-09-11 20:33:24 +00:00
q3k 1dbefed537 Merge "cluster/kube: remove ceph diff against k0 production" 2021-09-11 20:32:57 +00:00
q3k 9f639694ba Merge "kartongips: switch default diff behaviour to subset, nag users" 2021-09-11 20:18:34 +00:00
q3k 29f314b620 Merge "kartongips: implement proper diffing of aggregated ClusterRoles" 2021-09-11 20:18:28 +00:00
q3k 4f0468fa26 cluster/kube: remove ceph diff against k0 production
This now has a zero diff against prod.

location fields in CephCluster.storage.nodes seem to have been removed
from the CRD at some point. Not sure how the CRUSH tree now gets
populated, but whatever, it's been working like this for a while
already. Same for CephObjectStore.gateway.type.

The Rook Operator has been zero-scaled for a while now due to b/6.

Change-Id: I30a836f273f4c1529f60fa9297c96b7aac412f59
2021-09-11 12:43:53 +00:00
q3k 59c8149df4 kartongips: switch default diff behaviour to subset, nag users
Change-Id: I998cdf7e693f6d1ce86c7ea411f47320d72a5906
2021-09-11 12:43:50 +00:00
q3k 72d7574536 kartongips: implement proper diffing of aggregated ClusterRoles
For a while now we've had spurious diffs against Ceph on k0 because of
a ClusterRole with an aggregationRule.

The way these behave is that the config object has an empty rule list,
and instead populates an aggregationRule which combines other existing
ClusterRoles into that ClusterRole. The control plane then populates the
rule field when the object is read/acted on, which caused us to always
see a diff between the configuration of that ClusterRole.

This hacks together a hardcoded fix for this particular behaviour.
Porting kubecfg over to SSA would probably also fix this - but that's
too much work for now.

Change-Id: I357c1417d4023691e5809f1af23f58f364353388
2021-09-11 12:40:18 +00:00
q3k d592e6836d Merge "ops, cluster: consolidate NixOS provisioning" 2021-09-11 10:38:43 +00:00
implr 7f7dcd9847 Merge "nix: upgrade readTree" 2021-09-11 10:19:03 +00:00
implr 56ff18c486 nix: upgrade readTree
Change-Id: I460800dc3d8095e2ae89b8bd6ed7c5f0c90b6ccf
2021-09-11 12:18:04 +02:00
q3k b3c6770f8d ops, cluster: consolidate NixOS provisioning
This moves the diff-and-activate logic from cluster/nix/provision.nix
into ops/{provision,machines}.nix that can be used for both cluster
machines and bgpwtf machines.

The provisioning scripts now live per-NixOS-config, and anything under
ops.machines.$fqdn now has a .passthru.hscloud.provision derivation
which is that script. When ran, it will attempt to deploy onto the
target machine.

There's also a top-level tool at `ops.provision` which builds all
configurations / machines and can be called with the machine name/fqdn
to call the corresponding provisioner script.

clustercfg is changed to use the new provisioning logic.

Change-Id: I258abce9e8e3db42af35af102f32ab7963046353
2021-09-10 23:55:52 +00:00
q3k 69ff6038d5 shell.nix: colorful prompt
https://object.ceph-waw3.hswaw.net/q3k-personal/815968ff10071d4192e464c91b64228e760128267311a94872006d87cbfd0bd9.png

Change-Id: Ia4eeddf045af0d0bdc962087aaeed55d11846648
2021-09-10 23:15:38 +00:00