This is the first pass at an ident protocol client. In the end, we want
to implement an ident protocol server for our in-cluster identd, but
starting out with a client helps me getting familiar with the protocol,
and will allow the server implementation to be tested against the
client.
Change-Id: Ic37b84577321533bab2f2fbf7fb53409a5defb95
These can be used by production jobs to get the source port of the
client connecting over HTTP. A followup CR implements just that.
Change-Id: Ic8e29eaf806bb196d8cfcfb604ff66ae4d0d166a
This emits short-lived user credentials for a `dev-user` in crdb-waw1
any time someone prodaccesses.
Change-Id: I0266a05c1f02225d762cfd2ca61976af0658639d
DeveloperCredentialsLocation used to glog.Exitf instead of returning an
error, and a consumer (prodaccess) used to not check the return code.
Bad refactor?
Change-Id: I6c2d05966ba6b3eb300c24a51584ccf5e324cd49
This fixes CVE-2021-3450 and CVE-2021-3449.
Deployed on prod:
$ kubectl -n nginx-system exec nginx-ingress-controller-5c69c5cb59-2f8v4 -- openssl version
OpenSSL 1.1.1k 25 Mar 2021
Change-Id: I7115fd2367cca7b687c555deb2134b22d19a291a
Each OSD is connected to a 6TB drive, and with the good ol' 1TB storage
-> 1GB RAM rule of thumb for OSDs, we end up with 6GB. Or, to round up,
8GB.
I'm doing this because over the past few weeks OSDs in ceph-waw3 have
been using a _ton_ of RAM. This will probably not prevent that (and
instead they wil OOM more often :/), but it at will prevent us from
wasting resources (k0 started migrating pods to other nodes, and running
full nodes like that without an underlying request makes for a terrible
draining experience).
We need to get to the bottom of why this is happening in the first
place, though. Did this happen as we moved to containerd?
Followup: b.hswaw.net/29
Already deployed to production.
Change-Id: I98df63763c35017eb77595db7b9f2cce71756ed1
This removes Docker and docker-shim from our production kubernetes, and
moves over to containerd/CRI. Docker support within Kubernetes was
always slightly shitty, and with 1.20 the integration was dropped
entirely. CRI/Containerd/runc is pretty much the new standard.
Change-Id: I98c89d5433f221b5fe766fcbef261fd72db530fe
This is an attempt to see how well we do without rules_nixpkgs.
rules_nixpkgs has the following problems:
- complicates our build system significantly (generated external
repository indirection for picking local/nix python and go)
- creates builds that cannot run on production (as they are tainted by
/nix/store libraries)
- is not a full solution to the bazel hermeticity problem anyway, and
we'll have to tackle that some other way (eg. by introducing proper
C++ cross-compilation toolchains and building everything from C,
including Python and Go)
Instead of rules_nixpkgs, we ship a shell.nix file, so NixOS users can
just:
jane@hacker:~/hscloud $ nix-shell
hscloud-build-chrootenv:jane@hacker:~/hscloud$ prodaccess
This shell.nix is in a way nicer, as it immediately gives you all tools
needed to access production straight away.
Change-Id: Ieceb5ae0fb4d32e87301e5c99416379cedc900c5
This unifies nixpkgs with the one defined in //default.nix and makes it
possible to use readTree to build the provisioners:
nix-build -A cluster.nix.provision
result/bin/provision
Change-Id: I68dd70b9c8869c7c0b59f5007981eac03667b862
This will permit any binding to system:admin-namespaces (eg. personal-*
namespaces, per-namespace extra admin access like matrix-0x3c) the
ability to create and updates ingresses.
Change-Id: I522896ebe290fe982d6fe46b7b1d604d22b4f72c
This turns admitomatic into a self-standing service that can be used as
an admission controller.
I've tested this E2E on a local k3s server, and have some early test
code for that - but that'll land up in a follow up CR, as it first needs
to be cleaned up.
Change-Id: I46da0fc49f9d1a3a1a96700a36deb82e5057249b
This gives us nearly everything required to run the admission
controller. In addition to checking for allowed domains, we also do some
nginx-inress-controller security checks.
Change-Id: Ib187de6d2c06c58bd8c320503d4f850df2ec8abd
This is the beginning of a validating admission controller which we will
use to permit end-users access to manage Ingresses.
This first pass implements an ingressFilter, which is the main structure
through which allowed namespace/dns combinations will be allowed. The
interface is currently via a test, but in the future this will likely be
configured via a command line, or via a serialized protobuf config.
Change-Id: I22dbed633ea8d8e1fa02c2a1598f37f02ea1b309
This change reflects the current production state.
Upgrade was done by going through following versions:
19.1.0 -> 19.2.12 -> 20.1.10 -> 20.2.4
Change-Id: I8b33b8116363f1a918423fd18ba3d1b5c910851c
It reached the stage of being crapped out so much that the OSDs spurious
IOPS killed the performance of disks colocated on the same M610 RAID
controllers. This made etcd _very_ slow, to the point of churning
through re-elections due to timeouts.
etcd/apiserver latencies, observe the difference at ~15:38:
https://object.ceph-waw3.hswaw.net/q3k-personal/4fbe8d4cfc8193cad307d487371b4e44358b931a7494aa88aff50b13fae9983c.png
I moved gerrit/* and matrix/appservice-irc-freenode PVCs to ceph-waw3 by
hand. The rest were non-critical so I removed them, they can be
recovered from benji backups if needed.
Change-Id: Iffbe87aefc06d8324a82b958a579143b7dd9914c
More as-builts. This has already been bumped. Had to coax ceph-waw2 to
upgrade despite the fact that it's horribly broken.
Change-Id: Ia762f5d7d88d6420c2fc25cf199037cbccde0cb3
This is after the monster^Wrook outage of the week two weeks ago caused
by bc01n03 dying.
Plan is to migrate ceph-waw3 to be external, yeet ceph-waw2, and extend
crdb-waw1 to another node.
Change-Id: I133af3b1171fea383b45bf06c51e48a5c40341e4
This disables DHCP on all k0 nodes. This change has been tentatively
deployed to bc01n01 (which is cordoned off in kube), and I will deploy
it to the rest of k0 machines once merged.
Change-Id: I96253a9d0acedb4512c877c64174992ffdb43d58
These tests are broken as they depend on some test data that we
currently don't have in hscloud. They should be fixed ASAP.
Change-Id: I2571c2958cb84e145a7e3a44171685ecf43cf499
This forks bitnami/kubecfg into kartongips. The rationale is that we
want to implement hscloud-specific functionality that wouldn't really be
upstreamable into kubecfg (like secret support, mulit-cluster support).
We forked off from github.com/q3k/kubecfg at commit b6817a94492c561ed61a44eeea2d92dcf2e6b8c0.
Change-Id: If5ba513905e0a86f971576fe7061a471c1d8b398
We want to be able to scrape controller-manager and scheduler metrics
into Prometheus. For that, each of them needs to:
1) listen on a secure port
2) have authn enabled
With this, any k8s user with the right permissions (and a bearer token
or TLS certificate) can come in and access metrics over a node's public
IP address. Access without a certificate/token gets thrown into the
system:anonymous user, which as no access to any API.
Change-Id: I267680f92f748ba63b6762e6aaba3c417446e50b
This notably fixes the annoying loopback issues that prevented hosts
from accessing externalip services with externalTrafficPolicy: local
from nodes that weren't running the service.
Which means, hopefuly, no more registry pull failures when
nginx-ingress gets misplaced!
Change-Id: Id4923fd0fce2e28c31a1e65518b0e984165ca9ec
This has been deployed to k0 nodes.
Current state of cluster certificates:
cluster/certs/ca-etcd.crt
Not After : Apr 4 17:59:00 2024 GMT
cluster/certs/ca-etcdpeer.crt
Not After : Apr 4 17:59:00 2024 GMT
cluster/certs/ca-kube.crt
Not After : Apr 4 17:59:00 2024 GMT
cluster/certs/ca-kubefront.crt
Not After : Apr 4 17:59:00 2024 GMT
cluster/certs/ca-kube-prodvider.cert
Not After : Sep 1 21:30:00 2021 GMT
cluster/certs/etcd-bc01n01.hswaw.net.cert
Not After : Mar 28 15:53:00 2021 GMT
cluster/certs/etcd-bc01n02.hswaw.net.cert
Not After : Mar 28 16:45:00 2021 GMT
cluster/certs/etcd-bc01n03.hswaw.net.cert
Not After : Mar 28 15:15:00 2021 GMT
cluster/certs/etcd-calico.cert
Not After : Mar 28 15:15:00 2021 GMT
cluster/certs/etcd-dcr01s22.hswaw.net.cert
Not After : Oct 3 15:33:00 2021 GMT
cluster/certs/etcd-dcr01s24.hswaw.net.cert
Not After : Oct 3 15:38:00 2021 GMT
cluster/certs/etcd-kube.cert
Not After : Mar 28 15:15:00 2021 GMT
cluster/certs/etcdpeer-bc01n01.hswaw.net.cert
Not After : Mar 28 15:53:00 2021 GMT
cluster/certs/etcdpeer-bc01n02.hswaw.net.cert
Not After : Mar 28 16:45:00 2021 GMT
cluster/certs/etcdpeer-bc01n03.hswaw.net.cert
Not After : Mar 28 15:15:00 2021 GMT
cluster/certs/etcdpeer-dcr01s22.hswaw.net.cert
Not After : Oct 3 15:33:00 2021 GMT
cluster/certs/etcdpeer-dcr01s24.hswaw.net.cert
Not After : Oct 3 15:38:00 2021 GMT
cluster/certs/etcd-root.cert
Not After : Mar 28 15:15:00 2021 GMT
cluster/certs/kube-apiserver.cert
Not After : Oct 3 15:26:00 2021 GMT
cluster/certs/kube-controllermanager.cert
Not After : Mar 28 15:15:00 2021 GMT
cluster/certs/kubefront-apiserver.cert
Not After : Mar 28 15:15:00 2021 GMT
cluster/certs/kube-kubelet-bc01n01.hswaw.net.cert
Not After : Mar 28 15:53:00 2021 GMT
cluster/certs/kube-kubelet-bc01n02.hswaw.net.cert
Not After : Mar 28 16:45:00 2021 GMT
cluster/certs/kube-kubelet-bc01n03.hswaw.net.cert
Not After : Mar 28 15:15:00 2021 GMT
cluster/certs/kube-kubelet-dcr01s22.hswaw.net.cert
Not After : Oct 3 15:33:00 2021 GMT
cluster/certs/kube-kubelet-dcr01s24.hswaw.net.cert
Not After : Oct 3 15:38:00 2021 GMT
cluster/certs/kube-proxy.cert
Not After : Mar 28 15:15:00 2021 GMT
cluster/certs/kube-scheduler.cert
Not After : Mar 28 15:15:00 2021 GMT
cluster/certs/kube-serviceaccounts.cert
Not After : Mar 28 15:15:00 2021 GMT
Change-Id: I94030ce78c10f7e9a0c0257d55145ef629195314
This prevents metallb routes being announced from all peers to our ToR,
thereby preventing issues with traffic hitting services with
externalTrafficPolicy: local.
There still is the from-host loopback issue, but that will be fixed by
upgrading to kube 1.15.
Change-Id: Ifc9964b46840aee82d99f0b6550188550e46fe04
This fixes compatibility with prodaccess tools built with Go 1.15, which
introduced 'X.509 CommonName deprecation' [1].
[1] - https://golang.org/doc/go1.15#commonname
Change-Id: I228cde3e5651a3e36f527783f2ccb4a2f6b7a8e3
This will be, at some point, a script to run on Gerrit presubmit (ie.
right before merge).
For now, you can manually run it to ensure that Everything At Least
Kinda Works.
Change-Id: I28b305fa81a4ca4a8e94ce4daa06fe9ae0184fe8
Previously, we had the following setup:
.-----------.
| ..... |
.-----------.-|
| dcr01s24 | |
.-----------.-| |
| dcr01s22 | | |
.---|-----------| |-'
.--------. | |---------. | |
| dcsw01 | <----- | metallb | |-'
'--------' |---------' |
'-----------'
Ie., each metallb on each node directly talked to dcsw01 over BGP to
announce ExternalIPs to our L3 fabric.
Now, we rejigger the configuration to instead have Calico's BIRD
instances talk BGP to dcsw01, and have metallb talk locally to Calico.
.-------------------------.
| dcr01s24 |
|-------------------------|
.--------. |---------. .---------. |
| dcsw01 | <----- | Calico |<--| metallb | |
'--------' |---------' '---------' |
'-------------------------'
This makes Calico announce our pod/service networks into our L3 fabric!
Calico and metallb talk to eachother over 127.0.0.1 (they both run with
Host Networking), but that requires one side to flip to pasive mode. We
chose to do that with Calico, by overriding its BIRD config and
special-casing any 127.0.0.1 peer to enable passive mode.
We also override Calico's Other Bird Template (bird_ipam.cfg) to fiddle
with the kernel programming filter (ie. to-kernel-routing-table filter),
where we disable programming unreachable routes. This is because routes
coming from metallb have their next-hop set to 127.0.0.1, which makes
bird mark them as unreachable. Unreachable routes in the kernel will
break local access to ExternalIPs, eg. register access from containerd.
All routes pass through without route reflectors and a full mesh as we
use eBGP over private ASNs in our fabric.
We also have to make Calico aware of metallb pools - otherwise, routes
announced by metallb end up being filtered by Calico.
This is all mildly hacky. Here's hoping that Calico will be able to some
day gain metallb-like functionality, ie. IPAM for
externalIPs/LoadBalancers/...
There seems to be however one problem with this change (but I'm not
fixing it yet as it's not critical): metallb would previously only
announce IPs from nodes that were serving that service. Now, however,
the Calico internal mesh makes those appear from every node. This can
probably be fixed by disabling local meshing, enabling route reflection
on dcsw01 (to recreate the mesh routing through dcsw01). Or, maybe by
some more hacking of the Calico BIRD config :/.
Change-Id: I3df1f6ae7fa1911dd53956ced3b073581ef0e836
We just had an outage seemingly caused by N-I-C sendings tons of traffic
to gitea, which in turn caused N-I-C to balloon in memory/CPU usage.
I haven't debugged the cause of this traffic, but I have disabled the
gitea TCP forward to Stop The Bleeding.
This change reflects ad-hoc production changes.
Change-Id: I37e11609f408fa3e3fbfafafba44dc83149b90a9
- we update NixOS to 20.09pre
- we fix an ACME option that's now required
- we switch from systemd-timesyncd to chrony (as timesyncd took a long
time to sync clocks after restart, leading to MON_CLOCK_SKEW errors
from ceph)
This has been deployed in production.
Change-Id: Ibfcd41567235bae3e3d8abeeed61f4694ae614ad
This adds a mod proxy system, called, well, modproxy.
It sits between Factorio server instances and the Factorio mod portal,
allowing for arbitrary mod download without needing the servers to know
Factorio credentials.
Change-Id: I7bc405a25b6f9559cae1f23295249f186761f212
ceph-waw2 has currently some production issues [1] which have started to
cause write failures in the registry. The registry is the only user of
ceph-waw2's affected pool, so we reduce the dumpster fire blast radious
by moving it over to ceph-waw3.
This has already been deployed and data has been migrated over (via
s3cmd sync), and the migration has been verified (by a push and pull,
and pull of an older image).
[1] - pgs stuck inactive in the object storage pool
Change-Id: I26789b52008bb7be953954ec3fd3dd727ac15347
In addition to k8s certificates, prodaccess now issues HSPKI
certificates, with DN=$username.sso.hswaw.net. These are installed into
XDG_CONFIG_HOME (or os equiv).
//go/pki will now automatically attempt to load these certificates. This
means you can now run any pki-dependant tool with -hspki_disable, and
with automatic mTLS!
Change-Id: I5b28e193e7c968d621bab0d42aabd6f0510fed6d
instead of Python packages
As usual with Python sadness, the @pydeps wheels are built on the bazel
host, so stuffing them inside a container_image (or py_image) will cause
new and unexpected kinds of misery.
Change-Id: Id4e4d53741cf2da367f01aa15c21c133c5cf0dba
"Anyone can pull all images" rule did only match on anonymous users. Now
it should match all users, including authenticated ones.
Change-Id: I2205299093feca51f30526ba305eadbaa0a68ecb
We would like gitea to have its ssh server exposed on TCP port 22 on the
same address as its web interface. We would also still like to use all
the automation around ingresses already in place (like cert-manager
integration).
To solve this, we create an additional LoadBalancer service for
nginx-ingress-controller and set up special tcp-services forwarding rule
to pass port 22 traffic to gitea-prod/gitea service, like we already do
in case of gerrit.
Change-Id: I5bfc901ebe858464f8e9c2f3b2216b254ccd6c4d
This turns the existing script into a proper sh_binary, and injects
dependencies (kubectl and jq) as deps into it.
This change also pulls in BUILDfiles for jq, and a dep (oniguruma) into
//third_party, and adds buildable external repositories for them.
The jq/oniguruma BUILDfiles are lifted from
https://github.com/attilaolah/bazel-tools/.
Change-Id: If2e548bd60a8fd34e4f3be767ae59c6b2f2286d9
It was getting large and unwieldy (to the point where kubecfg was slow).
In this change, we:
- move the Cluster function to cluster.libsonnet
- move the Cluster instantiation into k0.libsonnet
- shuffle some fields around to make sure things are well split between
k0-specific and general cluster configs.
- add 'view' files that build on 'cluster.libsonnet' to allow rendering
either the entire k0 state, or some subsets (for speed)
- update the documentation, drive-by some small fixes and reindantation
Change-Id: I4b8d920b600df79100295267efe21b8c82699d5b
We're not using them for anything. Initially they were going to be used
for nixops, but nixops is not very good, so let's just drop them.
We still have a Nix dependency for clustercfg.py when provisioning
nodes, but rules_nix/nixpkgs in WORKSPACE were unrelated to that.
Change-Id: I28c249507d1be9c5dbbd1ee764deccd9ab038549
We handwavingly plan on implementing monitoring as a two-tier system:
- a 'global' component that is reponsible for global aggregation,
long-term storage and alerting.
- multiple 'per-cluster' components, that collect metrics from
Kubernetes clusters and export them to the global component.
In addition, several lower tiers (collected by per-cluster components)
might also be implemented in the future - for instance, specific to some
subprojects.
Here we start sketching out some basic jsonnet structure (currently all
in a single file, with little parametrization) and a cluster-level
prometheus server that scrapes Kubernetes Node and cAdvisor metrics.
This review is mostly to get this commited as early as possible, and to
make sure that the little existing Prometheus scrape configuration is
sane.
Change-Id: If37ac3b1243b8b6f464d65fee6d53080c36f992c
This kills two birds with one stone:
- update the secretstore tool to be slightly smarter about secrets, to
the point where we can now just point it at a secret directory and
ask it to 'sync' all secrets in there
- runs the new fancy sync command on all keys to update them, which
is a follow up to gerrit/328.
Change-Id: I0eec4a3e8afcd9481b0b248154983aac25657c40
This was an attempt to make new calico nodes use a full FQDN. However,
this change seemingly also makes the calico control plane use the FQDN
for all existing nodes, as such breaking CNI for new pods.
We revert this change, thereby keeping all calico nodes names as
hostnames. We could fix this by editing /var/lib/calico/nodename on
hosts to FQDNs, but it might not be worth the effort.
See https://github.com/projectcalico/calico/issues/1093 for more
context.
Change-Id: I52bfb00f604053d57d3009aebd6c50db7dc74f58
We still use etcd as the data store (and as such didn't set up k8s CRDs
for Calico), but that's okay for now.
Change-Id: If6d66f505c6b40f2646ffae7d33d0d641d34a963
This previous allowed all namespace admins (ie. personal-$user namespace
users) to create any sort of obejct they wanted within that namespace.
This could've been exploited to allow creation of a RoleBinding that
would then allow to bind a serviceaccount to the insecure
podsecuritypolicy, thereby allowing escalation to root on nodes.
As far as I've checked, this hasn't been exploited, and the access to
the k8s cluster has so far also been limited to trusted users.
This has been deployed to production.
Change-Id: Icf8747d765ccfa9fed843ec9e7b0b957ff27d96e
This bumps Rook/Ceph. The new resources (mostly RBAC) come from
following https://rook.io/docs/rook/v1.1/ceph-upgrade.html .
It's already deployed on production. The new CSI driver has not been
tested, but the old flexvolume-based provisioners still work. We'll
migrate when Rook offers a nice solution for this.
We've hit a kubecfg bug that does not allow controlling the CephCluster
CRD directly anymore (I had to apply it via kubecfg show / kubectl apply
-f instead). This might be due to our bazel/prod k8s version mismatch,
or it might be related to https://github.com/bitnami/kubecfg/issues/259.
Change-Id: Icd69974b294b823e60b8619a656d4834bd6520fd