converting a blosxom weblog

Blosxom takes as input a source directory of text files and outputs (dynamically or statically) a heirarchy of web pages. The source material is represented multiple times in the output.

A source file ./foo/bar/baz.txt is considered to be in the category 'foo/bar' and will be rendered at $website/foo/bar/. It will also be rendered at $website/YYYY/MM/DD/, where YYYY-MM-DD is the date the input file was last modified. This date-based scheme is the one I used as the "primary" structure for my blosxom site.

dates

All source files last modified on date YYYY-MM-DD (no matter what category) will be rendered at $website/YYYY/MM/DD/, in reverse-date order.

Blosxom sites therefore are very vulnerable to the input files' datestamp being accidentally modified. If you use the git back-end for ikiwiki, you do not preserve timestamps of the blobs (files). Therefore, you need to encode the time of the post in some other way. The ikiwiki 'meta' plugin allows you to specify the creation date of the page using e.g.

[[!meta  date="Wed Feb  4 13:16:49 GMT 2009"]] 

I'd advise running a script that fetched the last modified time of each page and appended a meta directive to "fix" the page to that time.

format

Blosxom pages are almost pure HTML: the first line might be the page title, and in some cases anything before the first pair of newlines might be other metadata (depending on your precise blosxom setup). I had a small script that ate the first line of each file and appended a meta directive to set the page title to this value, as in my case, all of the blosxom files fit this format.

date pages

In order to continue to support access via the YYY/MM/DD scheme, I added some pages under the blog heirarchy that inlined matching pages:

[[!tag  aggregation]]
[[!inline  pages="log/* and !*/Discussion and !link(tag/aggregation)
  and creation_year(2008)" show=5 feeds=no]]

I needed to prevent these pages (as well as other pages inlining log posts) from including themselves. I therefore tag them all 'aggregation' and explicitly exclude such tagged pages from the inlines.

Why only show the top 5 posts? I'm mostly interested in making the URIs meaningful, and a page containing every post in a given year is not particularly useful. I need to look at other solutions for organising access to the log posts. There's some discussion of this at planet spamming.