The Perils of Writing Good Documentation
I’ve been thinking about documentation lately, and I feel unsatisfied with the options I currently have available to me for writing and publishing documents. This dissatisfaction is not too well defined; I can’t put my finger on exactly what I want, but when I look at my options I’m not too excited about any of them.
When I say “documentation,” I am talking about several slightly different things:
- Project Homepage for projects like upb and Gazelle. The goal of a project homepage is to answer the question “what is this project and why should I use it?” It should also be attractive enough for a person to feel like this project is high-quality stuff. And of course it should point them to the relevant resources (downloads, source tree, bug tracker, etc). Good examples: http://git-scm.com/, http://www.ruby-lang.org/en/, http://www.gazelle-parser.org.
- Manuals for projects like upb and Gazelle. The goal of the manual is to provide both tutorial-like and reference-like information about how to use the software. Manuals have a lot of structure and are a bit more formal, since they are intended to precisely explain how the software should be used. They tend to track the software itself more closely than the other types of documentation, and are often even checked into the source tree. For example, Gazelle’s Manual.
- Design discussions/rationale. This isn’t quite like a manual because instead of describing how the software works, they describe why the software is the way it is. What are the alternatives to your approach and why did you pick the one you did? What are the trade-offs? I don’t think we see as much of this documentation as we should in the open-source world, but one good example is the Python PEP process.
- General articles about a particular subject. I mean to write some documents that explain the basic ideas of parsing in a more approachable way than most parsing literature. The literature can be a bit oblique, and I think I could do a good job of explaining it in a way that anyone can understand.
The main options that I see available to me are:
- Plain HTML. Even I have come to the conclusion that this isn’t a good choice any more. Too much work to write, too little flexibility, not enough bang for the buck. Of the four documentation kinds above, the only one it remotely makes sense for is the “Project Homepage” case, but even that is too much work for me. Creating the Gazelle homepage took too much work, and it’s not even that awesome.
- Personal Wiki / MarkDown. By “Personal Wiki” I mean a wiki that you run yourself. I put this in the same category as MarkDown because the two tend to have the same advantages/disadvantages. The advantages are that you can get a reasonably attractive product with minimal effort, and they are fairly customizable. The big disadvantage is that no two markdown languages are compatible, and there are so many to choose from (seriously: MarkDown, ReStructured Text, Textile, AsciiDoc, and those are just the ones I know off the top of my head). It’s slightly scary to invest a lot into a format that is one of many possible contenders.
- Hosted Wiki, like the Google Code wiki or the GitHub wiki. In this case hosting is taken care of, but you have less control over the look and more stuff cluttering your page. Also, I can’t figure out why, but something about the design of Google Code makes me totally uninspired to write any documents in its wiki. Another thing to note is that if a hosted wiki disappears (GitHub is only a startup, it could totally go under), it’s not clear what happens to your documents!
- DocBook, which is a little better than a MarkDown scheme because DocBook seems to have gained some critical mass. Still, the DocBook people seem to have a mild-to-moderate case of XML-itis, and the DocBook homepage seems more concerned with spitting acronyms at you than telling you if DocBook is capable of something basic like theming your document in different ways.
So as you can see, I’m not super satisified with any of my options. The Gazelle Manual uses AsciiDoc, which seems to work ok, and I would probably choose it again. I guess I’d be most inclined to choose either AsciiDoc or DocBook for writing general articles (I like this article about Python types and objects which was made using DocBook and is attractive).
I can’t decide what to do for the Project Homepage or the Design Discussions case. I really want to have attractive Project Homepages, but I don’t have too much web design talent and HTML is too much work for me. For Design Discussions I guess I’m leaning towards the GitHub wiki just because it pairs with the project hosting nicely, though I am somewhat uncomfortable with the idea that GitHub could disappear one day, and that moving my documents from the GitHub wiki somewhere else sounds like a headache.
Hi Josh,
I was faced with the same problem a couple years back. Back then I put all documentation our new and shiny Confluence Wiki, but found out that the export wasn’t very good (read: unusable).
As a result of this, I started to develop the Scroll Wiki Exporter for Confluence which exports trees of wiki pages to DocBook or PDF. That way you can work on the documentation collaboratively and have means to further process you documents (based on DocBook).
For both Confluence and Scroll free community editions are available.
-Stefan
Hi Josh,
I used docbook for a couple of cafepy articles and even though it tends to be fairly verbose because of the XML tags, I would use it again. It has some built in features (such as callouts: http://www.docbook.org/tdg/en/html/callout.html, multiple output formats from same source, etc.) which make it attractive.
All the XML tags do slow you down so for shorter articles where I don’t need all these features, I stick with Markdown or reST based stuff. It should be possible to convert from any format to any other so I’m not worried about being locked-in. Also look at Sphinx (http://sphinx.pocoo.org/index.html) – it is reST based and was written for the Python language documentation. Looks like it can even do multiple output formats.