Googlebot/Test

Forum Moderators: DixonJones

Message Too Old, No Replies

Googlebot/Test

64.68.89.144

Stefan

2:53 am on Mar 6, 2004 (gmt 0)

The server is responding with a 406. Anybody have an idea of what this is? Our resident Google rep doesn't seem to be interested in identifying it.

jbgilbert

2:56 am on Mar 6, 2004 (gmt 0)

406 is:
Client Error - Not Acceptable

very interesting response... never seen that one.

jbgilbert

2:57 am on Mar 6, 2004 (gmt 0)

stefan,
If you will sticky mail me the url that is getting this error, I'll run a sniffer to see if I can find out more.

Dreamquick

4:14 am on Mar 6, 2004 (gmt 0)

For a start that IP (64.68.89.144) isn't registered to Google as their normal crawlers would be, instead it's part of an Exodus / Cable & Wireless block so it could potentially be something else masquerading as GoogleBot.

I've seen 406s few times from one specific bot (which used to hunt down multimedia files), but it was very rare...

Reading through a few sites (for the most part Google is your friend) the general consensus is that 406 is generated by the server when a resource is requested by the client, but the resource type is not covered by the list of items the client claims to accept/understand (using the "Accept:" field in the request header).

Search for "406 Not Acceptable" here (interesting read);
[w3.org...]

For example if I were to create a bot that only understood HTML and GIF I could set it's Accept: to "Accept: text/html, image/gif" and in theory if it were to accidentally request a resource with a MIME type of anything other than these two, for example a PNG then the server could return a 406 error code because it has been told my bot wasn't capable of interpretting those filetypes.

- Tony

Stefan

4:43 am on Mar 6, 2004 (gmt 0)

Hey thanks, Dreamquick

When I first saw the thing this morning I checked WW forum3 to see if anyone else had spotted it. Others had.

[webmasterworld.com...]

Msg #21 and 23.

GG posted after those messages, didn't acknowledge the posts, and because he seems to read through threads well, it seemed a bit odd.

I think it was a legit googlebot. I only had 2 hits from it out of 40 or so normal bot visits during the same 24 hrs so it was no big deal.

This is what it looked like with the specifics muddied up.

2004-03-05 03:37:30 64.68.89.144 GET /robots.txt 200 1671 139 www.site.org Googlebot/Test -

2004-03-05 03:37:30 64.68.89.144 GET /b**rows_icc_030510.htm 406 4085 134 www.site.org Googlebot/Test -

2004-03-05 03:57:58 64.68.88.18 GET /or**nteering_021125_20.htm 406 4085 138 www.site.org Googlebot/Test -

<edit>

[edited by: Stefan at 5:21 am (utc) on Mar. 6, 2004]

GeorgeGG

4:51 am on Mar 6, 2004 (gmt 0)

Whois query rwhois.exodus.net:4321 by IP address: '64.68.89.144'

'whois -h rwhois.exodus.net:4321 64.68.89.144'

%rwhois V-1.5:001ab7:00 rwhois.exodus.net (Exodus Communications)
network:Class-Name:network
network:Auth-Area:0.0.0.0/0
network:Network-Name:64.68.88.0
network:IP-Network:64.68.88.0/21
network:Organization;I:Google Inc.-BGPconfig-SC3DC3
network:Name;I:Google Inc.
network:Email;I:dns-admin@GOOGLE.COM
network:Street;I:2400 E. Bayshore Pkwy
network:City;I:Mountain View , CA 94043

GeorgeGG

volatilegx

3:36 pm on Mar 9, 2004 (gmt 0)

thanks GeorgeGG

giles

2:18 pm on Mar 10, 2004 (gmt 0)

Another data point: I'm seeing the same thing:

I'm running on Apache 1.3.28 with MultiViews and DirectoryIndex turned on. Requests for the directory work (200 returned) but Googlebot's requests for a php file, without the extension result in a 406.

contact is a directory with index.php:

crawler11.googlebot.com - - [10/Mar/2004:00:13:46 -0800] "GET /contact/ HTTP/1.0" 200 1344 "-" "Googlebot/2.1 (+http://www.googlebot.com/bot.html)"

new is actually new.php:

crawler14.googlebot.com - - [10/Mar/2004:00:13:25 -0800] "GET /new HTTP/1.0" 406 421 "-" "Googlebot/2.1 (+http://www.googlebot.com/bot.html)"

Anyone get any work arounds for this? I don't want to drop from the index!

Giles

giles

4:38 pm on Mar 12, 2004 (gmt 0)

OK. I think I've got to the bottom of this and have a workaround.

I've replicated the error with a home brew browser which only accepts text/html. 406 errors are generated because the server doesn't consider that it can honour the request.

Because the request excludes the file extension it triggers the mod_negotiation code of MultiViews. But the target file is a dynamic one (php in my case) with a directive:

AddType application/x-httpd-php .php

So mod_negotiation views the file as being of MIME type x-httpd-php which does not match the GET request accept list, hence the 406 Not Acceptable.

I've had a mail from someone who got round the problem by dropping MultiViews and instead using mod_rewrite to add the desired .php ending. That doesn't suit my needs so I kept looking.

You CAN keep MultiViews and dynamic files with restrictive accept lists. I used the type-map to allow me to specify the file content type:

In .htaccess:
MultiViewsMatch Handlers #Only needed for Apache 2
AddHandler type-map .var

For each php file, fubar.php, create a shadowing fubar.var containing:
URI: fubar

URI: fubar.php
Content-type: text/html

So now on requesting fubar, MultiViews will scan the directory and give preference to fubar.var The type map code of mod_negotiation will then serve up fubar.php as text/html

This works fine with my test harness - now just got to watch for Googlebot's response. Interesting to note that both Apache & Googlebot are behaving correctly - we're just used to most browsers being excessive in what they claim we'll accept.

Googlebot/Test

64.68.89.144

Stefan

jbgilbert

jbgilbert

Dreamquick

Stefan

GeorgeGG

volatilegx

giles

giles

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week