Wordpress to Toto

27 May 2011

Rationale

Though infrequent, my existing blog was a maintenance nightmare. A playground of mine since 2007, I had accumulated a combination of a Wordpress SQL database, corresponding XML backups; some entries polluted with WYSIWYG clutter, some written with Markdown for Wordpress.

Things became more organised when I switched to git-wordpress as the canonical source for writing and publishing. This came with it’s own problems however: each post would be littered with Wordpress identifiers and still required posts to be published through the web interface.

Choosing toto

I wanted more control. I wanted something simpler. I wanted something to hack on.

There are many static website generators that fulfil those requirements, the central idea being to kill the database, instead creating a website from plain-text files.

A popular choice is Jekyll, but I chose toto because of it’s simplicity. toto is also designed to run on Heroku; a Ruby web app host. For a low-traffic site like mine, there is absolutely no reason to pay for hosting these days. Heroku provides an excellent free service, but also check out GitHub Pages and Drydrop.

Importing old posts

I had a mass of unfinished draft posts in Wordpress. The first job was to begin a fresh repository that retained the content and history of the drafts in my git-wordpress repository.

git filter-branch is the tool for this job:

Clone the existing repository:

 git clone --no-hardlinks /path/to/existing/repo new-repo

Delete what you don’t want (from all branches):

 git filter-branch --index-filter \
   "git rm -rf --cached --ignore-unmatch articles" --prune-empty -- --all

Clean up:

 cd new-repo
 rm -rf .git/refs/origin
 git remote rm origin
 git gc --aggressive
 git prune

Customising toto

Reminiscent of a popular git branching strategy, I broke each customisation into distinct components:

Design

Minimalism. Nothing else. Growing tired of the plague of bloated websites, I’ve started a collection of minimal stylesheets for the most worthy. Design is hard.

I wanted my website to be as readable as possible, with focus purely on content. Rather than reinventing the wheel, I used the html5boilerplate and built up. Using Vladimir Carrer’s Better Web Readability Project as a guideline, I’ve aimed for something with as little visual noise as possible.

Syntax Highlighting

I write a lot of code, so it had to look good. This was a simple one. Following the syntax highlighting guide of toto’s wiki, I chose a server-side approach (to minimise any impact on page weight or additional HTTP requests with Javascript) using Rack::CodeHighlighter and CodeRay. Simply adding it to the Gemfile and integrating a theme sufficed.

Search Engine Optimisation

Using a similar approach as Dmity Fadeyev, I write a one-line abstract for each post that’s used in the meta description tags. I also used his logic to generate friendly page titles.

The second, and unfortunately laborious task was to make sure the new site structure was index correctly by Google. Following the excellent moving your site guide, the majority of the work involved writing 301 redirect rules.

Since the URL style changed from…

/[category]/[post]/

to:

/[YYYY]/[MM]/[DD]/[post]/

… a regex solution seemed impossible.

Although beyond the scope of this post, I rolled up my Python sleeves and wrote a script to do it for me. gen301 takes a list of URLs and using fuzzy search techniques, checks file names for possible matches. I set it on my journal and using rack-rewrite solved the problem. Please let me know how I could have done it easier.

Credit

I scoured a lot of resources to create this site (as you can see from the amount of links in this post). However, the following inspired me most:

Steve Losh
Jason Blevins
Ethan Schoonover’s Solarized

Of course, the site itself is open source. Hope you learn something from it!