Forum Moderators: phranque

Message Too Old, No Replies

/this_is_a_test_of_404_response

Not Google, it's BecomeBot!

         

JAB Creations

12:10 pm on Oct 26, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I recently read on hudin's site that "/this_is_a_test_of_404_response" was google checking for 404 responses. Well I just had this happen recently (6 instances) and they were all from BecomeBot which to my knowledge is independent of Google.

Any other takes on this?

MatthewHSE

3:40 pm on Oct 26, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I was just going to post on this. Same request, same UA, very irritating.

Why should they run automated tests of my site, trying for a 404? I'm going to investigate a bit more, but at the moment I'm leaning toward banning them.

Staffa

5:00 pm on Oct 26, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Last week I saw the same request.

I have Becomebot banned since it started to appear. They have nothing to show for their crawling and are therefore wasting my bandwidth ;o)

MatthewHSE

8:26 pm on Oct 26, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I went to the URL given in the UA string for this bot, found a contact form, and sent them a quick request for information about what these 404 response things were about. Here's part of the what I got back:

Many sites on the web use a "soft-404" where they will return a
valid page containing a customized error message (and often some
navigation assistance). By checking for the presence of
these pages it helps us to know if these exist on a site we are
crawling, and to avoid crawling and indexing them.

This seems reasonable to me, and they don't seem to use much bandwidth on my site, and I'm nowhere near my bandwidth limit, so I'm going to let them keep coming by. Always time to ban them later if they misbehave! ;)

Staffa

8:50 pm on Oct 26, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Many sites on the web use a "soft-404" ................ (and often some navigation assistance).

So are they saying that if they can't get in via the front door, they'll try it via the back door?

Dijkgraaf

9:18 pm on Oct 26, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



No, what they are saying is that some we sites, rather than issuing an actual 404 error code when a non existant resource is requested, they will server up an "error page" with a 200 code.
To avoid indexing a lot of non existant pages they are running a test to see if this is the case, and then later if they get a page back which exactly matches that error page, they know not to bother indexing it as the URL is not valid.

larryhatch

9:28 pm on Oct 26, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



That's fine. Now why do they pretend to be Google?

jatar_k

9:31 pm on Oct 26, 2005 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



>> why do they pretend to be Google

how are they pretending to be G?

>> they will serve up an "error page" with a 200 code

essentially they are trying to account for some people's foolishness

SEOMike

4:03 pm on Oct 27, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I did some more checking on this issue and found a pretty good URL about the Bot. [become.com...] You can check there for a range of IPs to be sure it's their bot hitting your site. It sounds like it is from the response from their support.

kevinpate

4:17 pm on Oct 27, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Their bot seeks to 'identify shopping-related web sites' and as that isn't my focus, they're simply fed a steady diet of tasty 403 tidbits.

larryhatch

9:25 pm on Oct 27, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



" >> why do they pretend to be Google?

how are they pretending to be G? "

My mistake. I checked my logs. They identify themselves as Becomebot.

I was reacting to post #1 where somebody indicated identification as Google. -Larry

g1smd

1:33 pm on Oct 28, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I have found a site where certain URL typos take you to a Custom 404 Page with a page title of "Error 404" and containing some basic site navigation, but that page is served with a status of "302"... aaarrrgghhh!