The WWWMM Robot

What is W3M2 ?

The W3M2 Robot is a Web wanderer robot written for several experimental purposes, but more especially to study throughput / latency on paths in the net. It does a depth-first traversal (I know, it seems crazy) and collect some data. W3M2 is run on an occasional basis.

How to identify W3M2 ?

W3M2 sets the following fields:

What links does W3M2 follow and how to avoid it ?

First of all, W3M2 follows only http URLs (actually, it also unconditionally follows the X-CALIBRATE-CHANNEL links, but I bet you've never heard of them). It starts discarding everything in the URL from some funny characters like # or ? to the end. Then, it looks for the extension (the few characters following the dot near the end), if any. If there is none, it follows the link, assumed to be an ill-formed pointer to a directory with a default file (there are many of them). If an extension is found the link is followed, only if the extension is html.

W3M2 is now compliant with the standard for robot exclusion proposed by Martijn Koster. If you don't want your site to be visited by W3M2, you should create a document called /robots.txt.

If the document contains an entry User-Agent: W3M2, WWWMM or * (case insensitive), then W3M2 won't try to visit the directories listed in the following Disallow lines.

What does the MM of WWWMM stand for ?

No one really knows. Some have said it is Mad Max.

Raw results from W3M2 travels

The number at the end of each line is the document size.

W3M2 has been written by Christophe Tronche. Many thanks to Martijn Koster, who pointed out the lack of identification of W3M2, and introduced me to some other robots.

More about Web robots


Christophe Tronche, ch@tronche.com