Forum Moderators: open
I want to discuss the posibility that this bot is somehow different in that it does not find you in conventional fashion.
Or,
Let me ask why this is the first bot I've ever seen that comes to you regularly from sites on which you are listed, rather than ( what I understand to be ) conventionally?
64.124.85.194 - - [23/Dec/2004:03:23:35 -0800] "GET /robots.txt HTTP/1.1" 200 1727 "-" "Mozilla/5.0 (compatible; BecomeBot/2.0beta; +http://www.become.com/webmasters.html)"
64.124.85.194 - - [23/Dec/2004:03:23:43 -0800] "GET / HTTP/1.1" 200 20407 "http:/MajorDirectoryWhereListed/" "Mozilla/5.0 (compatible; BecomeBot/2.0beta; +http://www.become.com/webmasters.html)"
Switched IPs and Versions...
64.124.85.91 - - [23/Dec/2004:05:03:41 -0800] "GET /robots.txt HTTP/1.1" 200 1727 "-" "Mozilla/5.0 (compatible; BecomeBot/1.23; +http://www.become.com/webmasters.html)"
64.124.85.91 - - [23/Dec/2004:05:03:55 -0800] "GET /robots.txt HTTP/1.1" 301 264 "-" "Mozilla/5.0 (compatible; BecomeBot/1.23; +http://www.become.com/webmasters.html)"
64.124.85.91 - - [23/Dec/2004:05:03:55 -0800] "GET /old,OLDfilename.html HTTP/1.1" 403 262 "http://LinkedToMe.htm" "Mozilla/5.0 (compatible; BecomeBot/1.23; +http://www.become.com/webmasters.html)"
Either way, it always seems to come to my site thru another, from which I am listed.
I seems to me that it's doing exactly what other 'bots do: following links on known pages to discover other not-yet-known pages.
What's different is that it *tells* you how it found your page -- in that it provides a referrer string, whereas most 'bots don't. While a sophisticated 'bot might have a three-phase approach where it first collects many links and puts them into a database, removes duplicate links, and then pulls that set of de-duplicated links from its database and spiders them, later repeating the process, this bot apparently follows newly-discovered links immediately -- an approach better-suited to small indexes than to large.
Using the de-duplication method I posit for spiders like Googlebot above, they can remove many duplicate links before spidering, whereas if you spider links immediately, you won't have any method to "remember" previously-spidered pages that have several links pointed to them. So, you might follow multiple links to the same page. But on the other hand, with this method you can provide a meaningful referrer string for each page that you spider.
So, it's interesting that this 'bot supplies a referrer, and it would be interesting to see if you can detect duplicate incoming links by observing Becomebot spidering the *same* page on your site using several *different* referrers.
This de-duplication method, by the way, is the reason that most spiders don't provide a referrer; They may be spidering your page because they found *several* references to it -- maybe even require that several pages refer to your page before they will spider it. If they do have several referrers for your page, then obviously, they'd have to pick which referrer to provide to you -- an extra step in the process, so I suspect that most simply don't bother.
Your discovery opens a interesting window into the internal operation of this robot.
Jim