Absurd crimes with shadow bucket migration
Published on , 1005 words, 4 minutes to read
Because of course you can use an AI model as a key-value store.

S3 is a key-value store that lets you put "objects" into "buckets" and then retrieve them and their metadata later. The original intent for S3 was to function as an "unlimited FTP server", or a place where you can just put data and later get it back without having to care about how or why it's stored. This is a great model for a lot of things, but there's some limitations with how S3 fundamentally works that can make it difficult to enjoy.
Tigris is a globally distributed object store that implements the S3 API, but actually stores your data globally. One of the features it offers is shadow bucket migration, which allows you to set up a bucket that lazily copies objects over from another bucket on demand. This allows you to migrate over the objects that are actually used from your old provider to Tigris without having to do a big upload of everything that will undoubtedly cost you an arm, a leg, and both of your spare kidneys.
However, this feature isn't just limited to S3 providers. In theory, it works for every platform that can implement a passable version of the S3 GetObject call. This means you can use this to cache anything you want, from any source you want, in Tigris. This is where the magic of fall-through caching comes in.
Fall-through caching is a term I'm coining that describes the above process. The basic idea is that if an object is already in the cache, then serve it directly. If not, then generate it and return it for the cache to store and then serve to the user. This is a great way to save on compute costs, because you only have to generate the object once and then it's stored in the cache for you.
Tigris shadow bucket migration made this really easy and elegant to implement. I wish there was a more direct way that wasn't as criminal as this, but for now this works (to my horror).
I implemented this for XeDN's avatar generator. It's a gravatar-compatible endpoint that I originally wrote with voice coding on livestream. The main gimmick is that it translates the md5 hash in the URL to a prompt for Stable Diffusion. This combined with a set seed based on lower bytes in the hash means that I'm effectively using Stable Diffusion as a key-value store. The key is the prompt, and the value is the generated image.
If you want to try this out, check out the live demo. Type some stuff in the box and see what you get. It's great fun.
The main downside of this approach is that I haven't implemented authentication yet. So if you know the URL you can generate images. This is fine for my use case but you might want to add some kind of authentication if you're going to use this in production. Implementing authentication is therefore trivial and thus an exercise for the reader.
If you want to see the code, it's on GitHub. Just keep in mind that it's your fault if you use it and it breaks something horribly in production. This has not been extensively tested, and I'm not responsible for anything that happens as a result of you using this code.
Oh also, I served it on XeDN using Tigris bucket statics. Here's what I put in XeDN's fly.toml to make it happen:
[[statics]]
url_prefix = "/avatar"
guest_path = "/"
tigris_bucket = "azurda"
This configures fly-router to send any requests on /avatar/.* to the azurda bucket on Tigris. I use a similar strategy to serve all of my static assets.
Facts and circumstances may have changed since publication. Please contact me before jumping to conclusions if something seems wrong or unclear.
Tags: