BecomeBot/1.23 - New Version - Crawler, Spider, and User Agent ID forum at WebmasterWorld - WebmasterWorld

Forum Moderators: open

Message Too Old, No Replies

BecomeBot/1.23 - New Version

Seems to find my site thru those who link to me?

pendanticist

4:05 pm on Dec 23, 2004 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Spinoff of this thread: [webmasterworld.com...]

I want to discuss the posibility that this bot is somehow different in that it does not find you in conventional fashion.

Or,

Let me ask why this is the first bot I've ever seen that comes to you regularly from sites on which you are listed, rather than ( what I understand to be ) conventionally?

64.124.85.194 - - [23/Dec/2004:03:23:35 -0800] "GET /robots.txt HTTP/1.1" 200 1727 "-" "Mozilla/5.0 (compatible; BecomeBot/2.0beta; +http://www.become.com/webmasters.html)"
64.124.85.194 - - [23/Dec/2004:03:23:43 -0800] "GET / HTTP/1.1" 200 20407 "http:/MajorDirectoryWhereListed/" "Mozilla/5.0 (compatible; BecomeBot/2.0beta; +http://www.become.com/webmasters.html)"

Switched IPs and Versions...

64.124.85.91 - - [23/Dec/2004:05:03:41 -0800] "GET /robots.txt HTTP/1.1" 200 1727 "-" "Mozilla/5.0 (compatible; BecomeBot/1.23; +http://www.become.com/webmasters.html)"
64.124.85.91 - - [23/Dec/2004:05:03:55 -0800] "GET /robots.txt HTTP/1.1" 301 264 "-" "Mozilla/5.0 (compatible; BecomeBot/1.23; +http://www.become.com/webmasters.html)"
64.124.85.91 - - [23/Dec/2004:05:03:55 -0800] "GET /old,OLDfilename.html HTTP/1.1" 403 262 "http://LinkedToMe.htm" "Mozilla/5.0 (compatible; BecomeBot/1.23; +http://www.become.com/webmasters.html)"

Either way, it always seems to come to my site thru another, from which I am listed.

Lord Majestic

4:00 pm on Dec 26, 2004 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Switched IPs and Versions...

Technicality but more likely new beta version is being tested on separate machine (hence new IP) -- both are fed from the same URL server, so that they are likely to visit similar sites at the same time.

jdMorgan

7:31 pm on Dec 26, 2004 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

> I want to discuss the posibility that this bot is somehow different in that it does not find you in conventional fashion.

I seems to me that it's doing exactly what other 'bots do: following links on known pages to discover other not-yet-known pages.

What's different is that it *tells* you how it found your page -- in that it provides a referrer string, whereas most 'bots don't. While a sophisticated 'bot might have a three-phase approach where it first collects many links and puts them into a database, removes duplicate links, and then pulls that set of de-duplicated links from its database and spiders them, later repeating the process, this bot apparently follows newly-discovered links immediately -- an approach better-suited to small indexes than to large.

Using the de-duplication method I posit for spiders like Googlebot above, they can remove many duplicate links before spidering, whereas if you spider links immediately, you won't have any method to "remember" previously-spidered pages that have several links pointed to them. So, you might follow multiple links to the same page. But on the other hand, with this method you can provide a meaningful referrer string for each page that you spider.

So, it's interesting that this 'bot supplies a referrer, and it would be interesting to see if you can detect duplicate incoming links by observing Becomebot spidering the *same* page on your site using several *different* referrers.

This de-duplication method, by the way, is the reason that most spiders don't provide a referrer; They may be spidering your page because they found *several* references to it -- maybe even require that several pages refer to your page before they will spider it. If they do have several referrers for your page, then obviously, they'd have to pick which referrer to provide to you -- an extra step in the process, so I suspect that most simply don't bother.

Your discovery opens a interesting window into the internal operation of this robot.

Jim

pendanticist

5:53 pm on Dec 28, 2004 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Thanks Jim. :)

Great explination!

I've been following this bot the last few days and found it replicates the pattern I described.

pendanticist

2:45 am on Jan 30, 2005 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

64.124.85.92 - - [29/Jan/2005:18:18:26 -0800] "GET /robots.txt HTTP/1.1" 200 1751 "-" "Mozilla/5.0 (compatible; BecomeBot/1.70; MSIE 6.0 compatible; +http//www.become.com/webmasters.html)"
Don't version numbers usually increase in value, rather than decrease?

wilderness

5:18 pm on Mar 11, 2005 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Top Contributors Of The Month

Interesting read from ClickZ Newsletter:

Who's Shopping? Who's Researching?
by Rebecca Lieb
http ://nl.internet.com/ct.html?rtr=on&s=1,1g9j,1,65hv,iuv2,lwo6,heam