October check-in: one-year retrospective

For the last month I have been working at my new job and not at all on this project. Prior to that, I had been working on this project for a year and gotten something done every day, no exceptions, for eight months.

There were two reasons for the change. The first was that I am having a great time working with a team again and I wanted to give that all my attention while getting settled. The second reason was that I needed a break and some distance to see where to go next. Now I want to recap what I did with the year I’ve just spent, what I learned, and the status of the project.

After changing course a few times, I established a goal early in the year of building a minimally expensive and maintenance-free hostable personal social media system. I wanted it to enable sharing text and image posts with friends, comments and likes, and potentially some kind of payment mechanism.

The first prototype I built was a simple serverless static site generator based on markdown files dropped in S3. It validated certain routing, static site generation, and web-serving patterns that I expected to use.

Then I hit a decision point. It was not acceptable for the system operator to manage the blog through the S3 UI. That meant that I needed to choose a new user interface. I found an iPhone app that lets you upload to S3; it was close, but in the end too hacky. I considered building a chatbot so that the system could be managed through a private slack org, but I didn’t want to require even more setup and I didn’t want to be tied to slack.

So I adapted and carefully tested a serverless oauth application design and created a admin website module. I hadn’t done that before and it took a few months to get working at the level of confidence I wanted.

During this time I was also periodically trying to recreate the whole environment from scratch. Long-lived terraform stacks can develop dependency cycles between resources such that the stack can’t be built from scratch. The other long-term maintenance difficulty was the state file. The state file duplicates the records of resources in proportion to where they are used, so certain types of interconnection between different modules cause it to grow exponentially^[1]. Both of these difficulties can be managed by structuring modules carefully. A few random rules that I follow; I might expand or change them:

In general if a variable is referenced in more than one place it should be declared as a local.
Wherever possible, modules should use string operations on variables (not resource outputs) to build exported resource IDs and arns (which is not difficult; arns are simple, well-specified formats and aws themselves sometimes advocate doing so). The reason for this rule is that using a terraform resource output creates a dependency between the resource and the place where it’s referenced, while using information available in the inputs does not^[2]. In a small minority of cases the resources have actual dependencies on each other, but I prefer to use the depends_on argument to enforce that rather than implicit links created by resource outputs.
Modules that hold state must also manage access permissions for that state. External modules may not give themselves access to things.
Modules that do more than wrap a specific aws resource are given fanciful names like “tetrapod” or “cyanobacteria” instead of functional names like “blog.” This is because I invariably end up trying out a few versions, and it’s hard to have many similarly-named functional components.
Commonly-used resources like caller region and caller account ID should be at the root level and passed in to each module. It’s too easy to end up looking them up hundreds of times.
Every resource module must accept or generate a random string to include in its name. It should expose its useful ids (generated with string operations) on its output. This enables certain types of parallel environment creation.

The first few times I “finished” a piece like the blog or admin page, I would end up spending one or two weeks untangling deployment issues before terraform could apply it from scratch. In more recent instances that process takes less than a day. The main gist of what I’ve learned is that even though terraform is pretty forgiving of infrastructure tangles, the more carefully you can avoid them the more efficient it is.

The login system unlocked my ability to do UIs, so I experimented with a few different patterns. I did two pages for the blog (a blog list and post authoring page) and then I made a separate plugin for the cost-tracker. I spent a lot of time on the security design to isolate plugins from each other. I set up a pattern for plugins to each get the same basic layout, so they don’t need to worry too much about desktop / mobile, but which can be customized arbitrarily. The UI (i.e. the actual files in s3) are delivered in the same terraform apply command that sets up everything else. There is extremely little UI-specific development tooling anywhere, just a js bundling step. The most heavyweight frontend library I’m using is Prosemirror, because it’s a lovely thing and even I’m not crazy enough to try to build a text editor from scratch.

The cost tracker was the second plugin I built. Pretty much every person in a leadership position on a tech team knows how hard it can be to get good information on budgeting, and it was interesting to spend some time getting to grips with it. It really is pretty fantastically complicated, and the main thing I learned was that you save yourself a lot of pain if you settle for “good enough” early in the process, because “good enough” is all you’re going to get no matter how long you spend. It was at this point that I could see in my own UI that all of my service usage and development activity was costing about $1.30 / month^[3].

The final pre-alpha feature was a pageview counter. This turned out to be a use-case for sharing data between plugins (in this case between the blog and visibility plugins) so it took a bit more work to understand the right way to do that.

It was last August 2020 when I set out on this project; it was a little over 11 months later, in July 2021, when I released my alpha version of this system. It fell considerably short of the goal I set—it did not have any social features at all—but it was a high-reliability, low-cost, low-maintenance blogging system that was enabled by its foundation rather than encumbered by tech debt, so I’m pleased with the result.

The other thing I realized at this point was that I would have to be very patient with my continuing development. After I had the basic post-authoring system in place, it would take me about a week of trying to write posts on my phone in different situations before I really had any confidence that it was right. A few rounds of revisions in this way ate up about another month.

It was around this time that the year I had given myself for this project ended. I was still feeling pretty optimistic overall, but I’d realized a couple of things. First, I had learned that the speed at which I build UIs and get comfortable with them is pretty slow. It was obviously going to be a few months or more before I had a UI that anyone besides me wanted to use. In addition, I still had to build the plumbing for things like a news feed, comments, likes, etc.

So there was another decision to make. I’d had no money coming in for a year and I wasn’t realistically within six months of any plausibly world-beating release. I had the option to keep going but it seemed prudent to look around at the job market in case there were any obvious gems lying around—the job market is fairly untidy over long periods. So I started keeping an eye on job boards while I was working on the plumbing for the social features.

I had one final multi-day de-tangling session when I started working on the social features because it was the first time that I had deployed a parallel terraform stack. I established a basic design for interaction between connected sites and implemented a UI for making connections between sites. I also added a post-notification system that distributes posts among connected sites (but there’s no news feed UI yet so there isn’t a page where you can read connections’ posts).

It was at this point that I had found my current job, so it is in this state that the project has languished for a month or so. That’s kinda comforting—given no attention, I come back and everything still works like when I left it.

My next tasks are to continue building out the social UI features. I will continue to move slowly since my first priority is my day job, but I’m still excited about moving this work forward.

A heavyweight state file is bad for two reasons: first, it needs to be uploaded and downloaded when terraform runs, so it can add seconds to that process. Second, since it’s so critical it should always be stored in a versioning-enabled way, so it’s potentially stored many different times (modulo compression & deduplication). ↩︎
If you’re a person who works with terraform, the critical observation here is that at runtime, terraform conceptually flattens all of the modules into one big graph of resources and data sources. What this means is that dependency cycles only matter if they implicate async operations like data sources or resource creation. If Module A takes a known-at-start-time string input from Module B, uses it in a set of fixed, synchronous string operations to create an ID, and export that ID, Module B has no trouble depending on that output from Module A, even though it is technically a circular reference. ↩︎
This number excludes $12 of yearly domain registration fees but includes $0.50 / month of DNS for an unrelated domain name. A more accurate amortized monthly expense for the entire system would be about $1.80. ↩︎

Raphael Luckom

October check-in: one-year retrospective