Forum Moderators: phranque

Message Too Old, No Replies

mod_rewrite throws error 406

only for Googlebot ...

         

RonPK

10:05 pm on Jan 12, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I'm using this (in a .htaccess file) to create URLs that are friendly for both humans and SEs:

RewriteCond %{REQUEST_URI} ^\/widgets\/([0-9]{4})\/([0-9]+)$
RewriteRule .* /widgets.php?year=%1&id=%2 [L]

It rewrites
/widgets/2004/23
into
/widgets.php?year=2004&id=23

Works fine, except for Googlebot: Apache returns an error 406 ("Not Acceptable"). At first I thought it might have something to do with Googlebot using HTTP/1.0, but normal browsers using that protocol get the 200 content.

Does anyone know what could be the cause?

jdMorgan

11:48 pm on Jan 12, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



RonPK,

This is probably a MIME-type problem, and whatever MIME-type your server provides for your php file output is not acceptable to Google. Use the server headers checker [webmasterworld.com] to see what your server returns for requests. Google can handle text/html and the Word and PDF foramts, and maybe others, too.

Jim

RonPK

9:29 am on Jan 13, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Thanks for your reply, Jim. The MIME-type sent by the server seems fine: text/html. Also, PHP-scripts addressed directly by Googlebot get indexed correctly.

I've done some more research on the error message "no acceptable variant" as related to MIME-types. My guess now is that the URL /widget/2003/24 , with no extension, might cause some confusion. So I've changed all the links into /widget/2003/24.html and modified the rewrite stuff accordingly. Now I'll just have to wait for the bot to come by again...

jamie

3:24 pm on Jan 13, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



hi ron,

i had this problem with some php scripts as well. i tracked it down to the force-type directive which i used in my .htaccess

it occured when accessing a virtual subfolder path. e.g.

photos/12/2003

and using in my .htaccess

<Files photos>
ForceType application/x-httpd-php
</Files>

i stopped using that and changed everything over to addhandler and the headers were correct again - even without the .htm extension

i am not sure of the technical reason why it happened, but this might help you track it down

good luck

RonPK

1:01 pm on Jan 14, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



@jamie: I wasn't using any ForceType directive, but thanks anyway. Just the ordinary AddType in the virtual domain parts of httpd.conf.

Adding the .html extension didn't help either. I found some reports about the error message 'no acceptable content' being related to problems with mod_negotiation, so I started fiddling with that. This morning I switched off the MultiViews option, and now Googlebot gets the lovely 200 code again!

I'm definitely no Apache expert, so I won't draw any conclusions on what caused all this. I'm just happy it all works again :)

jimpoo

10:15 pm on Feb 19, 2004 (gmt 0)

10+ Year Member



I use
<FilesMatch "^widgets$">
Options MultiViews
ForceType application/x-httpd-php
</FilesMatch>
to force the reguest of /widgets/ID12345/abc.html handled by widget.php

Googlebot could get 200 code just last week, but these days it got 302 error.

I used the Server Header Check and it get 200 ok,
I'm thinking maybe Googlebot just want to retrieve the file of 'abc.html' and reject to negotiate with the apache server.
I'm still no idea how to figure it out.
Anybody can help me out?

Thanks.

jimpoo

4:08 pm on Feb 20, 2004 (gmt 0)

10+ Year Member



Hi RonPk,
How did you do on your .htaccess setting to get 200 code to Googlebot? Are you sure recently it still get 200 code to Googlebot?
I have the same problem, before 01/Feb/2004, Googlebot could get 200 code, but after 01/Feb/2004, totally no, and I haven't changed anything on the server.

I think it is because Googlebot just be modified and doesn't follow HTTP1.0 rules, and doesn't work on mod_negotiation.

RonPK

4:28 pm on Feb 20, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Hi jimpoo, 302 is not an error code.
I switched off multiviews, and that solved my problem. Who needs them anyway...

jdMorgan

4:39 pm on Feb 20, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



As I said in this thread [webmasterworld.com], I encourage you to contact Google and report this problem. They may or may not answer, but if you provide a good, clear description of your problem, they may be able to fix it -- Or at least to provide you with a link to an FAQ or something.

Your chances of getting a reply depend on how good a job you do explaining the problem, how well you document it (include a few lines from your log file, with and without MultiViews enabled), and whether what you submit reads like a professional "problem report" or just a complaint or a rant. A report that is "short, sweet, and to the point," but "well-documented enough to understand the problem" will be ideal.

I'm sure Google would at least be interested in hearing about technical problems.

Best,
Jim

Man in Poland

5:31 pm on Feb 22, 2004 (gmt 0)

10+ Year Member



I am encountering similar problems to those described above. I have checked my pages which used to return a 200 and are now returning a 406 to Googlebot with the SERVER HEADERS CHECKER, and there seems to be no problem with it. I am using Option +Multiviews in my .htaccess file, and am interested in replacing it with the Addhandler directive mentioned. Any further advice on how to do this would be much appreciated. I have also taken your advice and reported this to Google (with samples from my log) and do hope that I receive a reply! I wonder how widespread this problem is....?

jimpoo

2:05 am on Feb 26, 2004 (gmt 0)

10+ Year Member



After testing, for the purpose of making a URL search engine friendly, I got a conclusion now.

Don't use this solution:
-------------
Options +MultiViews
<FilesMatch "^widgets$">
ForceType application/x-httpd-php
</FilesMatch>
-------------
In this solution, googlebot will get 200 code the first time it crawled, but the second time, it will get 406 from the server, it is not the googlebot bugs, it is just a behavior of the new googlebot.

Now I used the mod_rewrite solution, it works fine:
------------
Options -MultiViews
RewriteEngine on
RewriteBase /
RewriteRule ^(.*)/(.*) /wedgets.php?v1=$1&v2=$2 [L]
------------
Make sure use 'Options -MultiViews' to turn off the MultiViews, this will disable the apache server content-negotiation feature, and googlebot has no way to get more information beyond the 200 return.

Googlebot is very tricky.