Converting legacy documents into HTML
The World-Wide Web has created an unprecedented flooding of
information. In the world. Back to the prehistoric times, finding
informations used to take an important amount of time in the day of
many of us, looking for technical, scientifical, juridical or any
other kind of information. I remember the frustration when I needed
days to find the specification of a file format or an electronic
component (in books made of paper, yes, paper, young people can you
believe it ???).
These dark ages are gone.
We are now in an era where obtaining the data we daily need is no more
a matter of storing voluminous technical books, subscribing catalog
update information letters, or having access to an expensive
library. We live the glory days where we just need to get access to an
indexer, type a few keywords and get everything from the Unix disk
blocks organization to french cooking (how sweet !). Known pioneers
from Ted Nelson to Tim Berners-Lee, from Marc Andreesseen and Eric
Bina to Louis Monier and many other known and unknown peoples deserve
the credit for that. As do every anonymous ant that work daily to put
more and more documents on the Web.
Is that so simple ? Unfortunately no.
The point is that humanity
hasn't started to think the day Marc or Eric launched Mosaic and it
didn't crash immediately. A large body of documents where written
prior the Web (or gopher or even ftp) birth. And we still use
them. Day to day, they are converted to HTML so we can put them on
the Web. Producing a good, useful, document from these legacy docs is
not always easy, specially considering that they haven't been designed
to be put on the Web.
So far, I have converted four non-trivial documents for the Web: The
Inter-Client Communication Conventions Manual,
the
GIF89a
specification, the
GWM Manual,
and the
Xlib Programming Manual.
I would like to share the little experience I gained doing this work, so as to
avoid repeating some errors.
Hypertext is more than text
A good hypertext document is more than a text added with some tags to
make it look fancy. A good hypertext document should contain
hyperlinks. A seminal paper on hypertext by
Vannevar Bush
was entitled
"As
we may think". And this is the essence of hypertext: when reading
you are interrogating yourself about what this or that means, you
should be able to just click on the disturbing word (or take whatever
action is appropriate), to be self-taught of who was this guy Vannevar
Bush and what he has written.
Always provide a way to read the document linearly
Although hypertext is in many ways superior to plain old text, a good
hypertext document is designed as an hypertext document,
while a plain old text was written to be read linearly. This is
the way it is the more understandable. So you should provide a way to
read the document from the first to the last word in a linear manner.
Do it by respect towards the original author's work, if for no other
reason.
Forget about numerical (or alphabetical) references
In legacy document, you often find references such as "see section 3,
p. 24". You should forget about this kind of references. They were
invented in a world without hyperlinks, to allow for fast information
navigation. These references should be hyperlinked, of course, but the
title shouldn't be something as meaningless as "sec. 3". As an
example, consider the following, and guess which one is the more
helpful in finding the information you are looking for:
For further details, see sections 4.1.2.4 and
4.1.4.
For further details, see Hints and properties and
Changing Window State.
Read and use your own translation
The same way good software is software written by people who use it,
I do think that good hypertext is the hyperlinked document that was
needed by the person who translated it. Try to be in the state of mind of
a novice reader, and browse through your translation. Most of the
time, you will think: "I should have a link here", or "what does this
mean exactly ?". This is the best
way to get a rich hypertext document.
Christophe Tronche, ch@tronche.com