Cadey is coffee
<Cadey> Hello! Thank you for visiting my website. You seem to be using an ad-blocker. I understand why you do this, but I'd really appreciate if it you would turn it off for my website. These ads help pay for running the website and are done by Ethical Ads. I do not receive detailed analytics on the ads and from what I understand neither does Ethical Ads. If you don't want to disable your ad blocker, please consider donating on Patreon or sending some extra cash to xeiaso.eth or 0xeA223Ca8968Ca59e0Bc79Ba331c2F6f636A3fB82. It helps fund the website's hosting bills and pay for the expensive technical editor that I use for my longer articles. Thanks and be well!

Prometheus and Aegis

Read time in minutes: 10

Last time in the xeiaso dot net cinematic universe:

Unix sockets started to be used to grace the cluster. Things were at peace. Then, a realization came through:

Mara is hmm
<Mara> What about Prometheus? Doesn't it need a direct line of fire to the service to scrape metrics?

This could not do! Without observability the people of the Discord wouldn't have a livefeed of the infrastructure falling over! This cannot stand! Look, our hero takes action!

Cadey is percussive-maintenance
<Cadey> It will soon!

In order to help keep an eye on all of the services I run, I use Prometheus for collecting metrics. For an example of the kind of metrics I collect, see here (1). In the configuration that I have, Prometheus runs on a server in my apartment and reaches out to my other machines to scrape metrics over the network. This worked great when I had my major services listen over TCP, I could just point Prometheus at the backend port over my tunnel.

When I started using Unix sockets for hosting my services, this stopped working. It became very clear very quickly that I needed some kind of shim. This shim needed to do the following things:

  • Listen over the network as a HTTP server
  • Connect to the unix sockets for relevant services based on the path (eg. /xesite should get the metrics from /srv/within/run/xesite.sock)
  • Do nothing else

The Go standard library has a tool for doing reverse proxying in the standard library: net/http/httputil#ReverseProxy. Maybe we could build something with this?

Mara is hmm
<Mara> The documentation seems to imply it will use the network by default. Wait, what's this Transport field?

type ReverseProxy struct {
  // ...

  // The transport used to perform proxy requests.
  // If nil, http.DefaultTransport is used.
  Transport http.RoundTripper

  // ...
}

Mara is hmm
<Mara> So a transport is a RoundTripper, which is a function that takes a request and returns a response somehow. It uses http.DefaultTransport by default, which reads from the network. So at a minimum we're gonna need:
  • a ReverseProxy
  • a Transport
  • a dialing function
    • Right?

Yep! Unix sockets can be used like normal sockets, so all you need is something like this:

func proxyToUnixSocket(w http.ResponseWriter, r *http.Request) {
  name := path.Base(r.URL.Path)

  fname := filepath.Join(*sockdir, name+".sock")
  _, err := os.Stat(fname)
  if os.IsNotExist(err) {
    http.NotFound(w, r)
    return
  }

  ts := &http.Transport{
    Dial: func(_, _ string) (net.Conn, error) {
      return net.Dial("unix", fname)
    },
    DisableKeepAlives: true,
  }

  rp := httputil.ReverseProxy{
    Director: func(req *http.Request) {
      req.URL.Scheme = "http"
      req.URL.Host = "aegis"
      req.URL.Path = "/metrics"
      req.URL.RawPath = "/metrics"
    },
    Transport: ts,
  }
  rp.ServeHTTP(w, r)
}

Mara is hmm
<Mara> So in this handler:

name := path.Base(r.URL.Path)

fname := filepath.Join(*sockdir, name+".sock")
_, err := os.Stat(fname)
if os.IsNotExist(err) {
  http.NotFound(w, r)
  return
}

ts := &http.Transport{
  Dial: func(_, _ string) (net.Conn, error) {
    return net.Dial("unix", fname)
  },
  DisableKeepAlives: true,
}

Mara is hmm
<Mara> You have the socket path built from the URL path, and then you return connections to that path ignoring what the HTTP stack thinks it should point to?

Yep. Then the rest is really just boilerplate:

package main

import (
  "flag"
  "log"
  "net"
  "net/http"
  "net/http/httputil"
  "os"
  "path"
  "path/filepath"
)

var (
  hostport = flag.String("hostport", "[::]:31337", "TCP host:port to listen on")
  sockdir  = flag.String("sockdir", "./run", "directory full of unix sockets to monitor")
)

func main() {
  flag.Parse()

  log.SetFlags(0)
  log.Printf("%s -> %s", *hostport, *sockdir)

  http.DefaultServeMux.HandleFunc("/", proxyToUnixSocket)

  log.Fatal(http.ListenAndServe(*hostport, nil))
}

Now all that's needed is to build a NixOS service out of this:

{ config, lib, pkgs, ... }:
let cfg = config.within.services.aegis;
in
with lib; {
  # Mara\ this describes all of the configuration options for Aegis.
  options.within.services.aegis = {
    enable = mkEnableOption "Activates Aegis (unix socket prometheus proxy)";

    # Mara\ This is the IPv6 host:port that the service should listen on.
    # It's IPv6 because this is $CURRENT_YEAR.
    hostport = mkOption {
      type = types.str;
      default = "[::1]:31337";
      description = "The host:port that aegis should listen for traffic on";
    };

    # Mara\ This is the folder full of unix sockets. In the previous post we
    # mentioned that the sockets should go somewhere like /tmp, however this
    # may be a poor life decision: 
    # https://lobste.rs/s/fqqsct/unix_domain_sockets_for_serving_http#c_g4ljpf
    sockdir = mkOption {
      type = types.str;
      default = "/srv/within/run";
      example = "/srv/within/run";
      description =
        "The folder that aegis will read from";
    };
  };

  # Mara\ The configuration that will arise from this module if it's enabled
  config = mkIf cfg.enable {
    # Mara\ Aegis has its own user account to keep things tidy. It doesn't need
    # root to run so we don't give it root.
    users.users.aegis = {
      createHome = true;
      description = "tulpa.dev/cadey/aegis";
      isSystemUser = true;
      group = "within";
      home = "/srv/within/aegis";
    };

    # Mara\ The systemd service that actually runs Aegis.
    systemd.services.aegis = {
      wantedBy = [ "multi-user.target" ];

      # Mara\ These correlate to the [Service] block in the systemd unit.
      serviceConfig = {
        User = "aegis";
        Group = "within";
        Restart = "on-failure";
        WorkingDirectory = "/srv/within/aegis";
        RestartSec = "30s";
      };

      # Mara\ When the service starts up, run this script.
      script = let aegis = pkgs.tulpa.dev.cadey.aegis;
      in ''
        exec ${aegis}/bin/aegis -sockdir="${cfg.sockdir}" -hostport="${cfg.hostport}"
      '';
    };
  };
}

Cadey is enby
<Cadey> Then I just flicked it on for a server of mine:

within.services.aegis = {
  enable = true;
  hostport = "[fda2:d982:1da2:180d:b7a4:9c5c:989b:ba02]:43705";
  sockdir = "/srv/within/run";
};

Cadey is enby
<Cadey> And then test it with curl:

$ curl http://[fda2:d982:1da2:180d:b7a4:9c5c:989b:ba02]:43705/printerfacts
# HELP printerfacts_hits Number of hits to various pages
# TYPE printerfacts_hits counter
printerfacts_hits{page="fact"} 15
printerfacts_hits{page="index"} 23
printerfacts_hits{page="not_found"} 17
# HELP process_cpu_seconds_total Total user and system CPU time spent in seconds.
# TYPE process_cpu_seconds_total counter
process_cpu_seconds_total 0.06
# HELP process_max_fds Maximum number of open file descriptors.
# TYPE process_max_fds gauge
process_max_fds 1024
# HELP process_open_fds Number of open file descriptors.
# TYPE process_open_fds gauge
process_open_fds 12
# HELP process_resident_memory_bytes Resident memory size in bytes.
# TYPE process_resident_memory_bytes gauge
process_resident_memory_bytes 5296128
# HELP process_start_time_seconds Start time of the process since unix epoch in seconds.
# TYPE process_start_time_seconds gauge
process_start_time_seconds 1617458164.36
# HELP process_virtual_memory_bytes Virtual memory size in bytes.
# TYPE process_virtual_memory_bytes gauge
process_virtual_memory_bytes 911777792

Cadey is aha
<Cadey> And there you go! Now we can make Prometheus point to this and we can save Christmas!

Mara is happy
<Mara> :D


This is another experiment in writing these kinds of posts in more of a Socratic method. I'm trying to strike a balance with a limited pool of stickers while I wait for more stickers/emoji to come in. Feedback is always welcome.

(1): These metrics are not perfect because of the level of caching that Cloudflare does for me.


This article was posted on M04 05 2021. Facts and circumstances may have changed since publication. Please contact me before jumping to conclusions if something seems wrong or unclear.

Tags: prometheus o11y

This post was not WebMentioned yet. You could be the first!

The art for Mara was drawn by Selicre.

The art for Cadey was drawn by ArtZora Studios.

Some of the art for Aoi was drawn by @Sandra_Thomas01.