From
is set to the email address of the operator
(usually tronche@lri.fr).
User-Agent
is set to W3M2/x.xx
where
x.xx
is the version number. Current version is 0.02, but
this is likely to change.
Referer
is set to the link that led W3M2 to the current one.
http
URLs (actually, it
also unconditionally follows the X-CALIBRATE-CHANNEL links, but I bet you've
never heard of them).
It starts discarding everything in the URL from some funny characters
like # or ? to the end. Then, it looks for the extension (the few
characters following the dot near the end), if any. If there is none, it
follows the link, assumed to be an ill-formed pointer to a
directory with a default file (there are many of them). If an
extension is found the link is followed, only if the extension is
html
.
W3M2 is now compliant with the standard
for robot exclusion proposed by Martijn
Koster. If you don't want your site to be visited by W3M2, you
should create a document called /robots.txt
.
If the document contains an entry
User-Agent: W3M2
, WWWMM
or *
(case
insensitive), then W3M2 won't try to visit the directories listed in
the following Disallow
lines.
W3M2 has been written by Christophe Tronche. Many thanks to Martijn Koster, who pointed out the lack of identification of W3M2, and introduced me to some other robots.