ashton
baker

a small personal blog

about:
me / this site

get in touch:
github / email

topics:
home

Building a self-archving blog with git

> published 2019-06-29 - revised 2019-06-29 (permalink)

As my GeoCities site — which died with the rest of them in 2009 – can attest, nothing on the web lasts forever. That's particularly true of media platforms like the ones that host our blogs. One day, perhaps soon and perhaps not, they will fizzle out, explode, disappear, and their data will be unceremoniously deleted. Of course, there's no use crying over the increase of entropy, but there is something alarming about the rate of this decay, given the great importance of the internet to our culture. And while projects like the Internet Archive offer some hope, webpages usually aren't designed to be archived, any more than cells are designed to live in a dish.

So I thought it would be interesting to write a website as a more-or-less self-contained program, which could be downloaded and run, zipped up and archived any way you please. This website is my attempt at such a program. You can download it here, and build it with Make. Beyond that, it requires only Git and Python to build.

I wrote a little bit here about why I wanted to make a blog in the first place, but I had all of these aspirations about the actual machinery of the site that caused me to put it off for quite a while. I have to thank Drew Devault for offering twenty dollars to anyone who started a blog - that seems to be the carrot on a stick that I needed to unplug the creative juices.

My first goal was to write this blog in as much of a minimal flavor of HTML as I could. I hadn't written HTML by hand in a long time, and it was a great learning experience. The final result might not have many modern amenities, but like a log cabin deep in the woods, perhaps it's a nice reprieve from the modern web. It was fun to make, and unlike a Wordpress theme, it's my own work, which makes me feel inspired to work on it and add to it.

After tinkering for a while, I realized that copying the sidebar to each page was going to quickly become infeasable. I didn't think that quite warranted the full power of a static site generator, so I decided to keep building things my own way. And I created a simple Python script which could take a page template, a directory of blog posts, and spit out my fully baked pages.

The final piece of the puzzle was to create an "archival link" for each page. The goal was for each revision of the page to have its own link, which would not change as the rest of the site changed. Since the whole site is in a Git repo, the repo itself is able to serve as the source of history. The idea is this: at each commit, you can archive the site by calling $ make archive_this_version. So each commit is responsible for building its version of the website to a directory, say ./www/ed054a38/. So if we want to build an archive of the site, we just check out each commit in order, and try to build the site. At the end of the process, we just copy the latest archive to the root of the site, and that's what people see when they first visit.

If you're interested in the implementation, here's the Makefile:

current_hash!=git rev-parse --short HEAD

make_site: make_archive
    mkdir -p ./www
    cp -r ./repo_copy/www/* ./www/
    cp -r ./www/$(this_hash)/* ./www/

make_archive:
    git clone ./ git_repo
    bash ./make_archive.sh

archive_this_version:
    mkdir -p ./www/$(this_hash)/
    ./scripts/build.py ./www/$(this_ref)/

The ./make_archive.sh script does the git wrangling, and the scripts/build.py script does all of the templating. I added some functionality to link to add revision notes and an archival link to each page. If you visit an archive link, you can browse a full version of the site as it existed at that commit.

There are lots of reasons not to build a site this way – I'm rebuilding and hosting a whole new copy of the site for each commit – but the combination of small text file sizes and infrequent posts should keep things manageable for a long time. And in the future, I could devise some system to archive only the files that changed at each commit. I'm not making too many assumptions about the previous commits, only that I can visit some of them and build the site, so I can forsee adding functionality over time.

More importantly, this is an exciting new place to organize some of my scattered projects, and I'm excited that I own it completely, and excited to add to it in the future.