Forum Moderators: phranque

Message Too Old, No Replies

Mod Rewrite stopping site being verified

Google Webmaster tools unable to verify site

         

Karma

10:58 am on Jan 8, 2008 (gmt 0)

10+ Year Member



Hi,

I've had issue with my mod rewrite for ages now, it's ugly and for some reason stops Google's webmaster tools from verifying my site, this also means that Google analytics doesn't work.

If I rename my .htaccess file, everything works fine. Strange.

--------------------------------------------------------------
RewriteEngine on
RewriteRule ^([^_]+)_([^/]+)/([^_]+)_([^/]+)/([^_]+)_([^/]+)/([^_]+)_([^/]+)/([^_]+)_([^/]+)/?$ /index.php?$1=$2&$3=$4&$5=$6&$7=$8&$9=$10 [L]
RewriteRule ^([^_]+)_([^/]+)/([^_]+)_([^/]+)/([^_]+)_([^/]+)/([^_]+)_([^/]+)/?$ /index.php?$1=$2&$3=$4&$5=$6&$7=$8 [L]
RewriteRule ^([^_]+)_([^/]+)/([^_]+)_([^/]+)/([^_]+)_([^/]+)/?$ /index.php?$1=$2&$3=$4&$5=$6 [L]
RewriteRule ^([^_]+)_([^/]+)/([^_]+)_([^/]+)/?$ /index.php?$1=$2&$3=$4 [L]
RewriteRule ^([^_]+)_([^/]+)/?$ /index.php?$1=$2 [L]
RewriteCond %{HTTP_HOST}!^www\.mydomain\.tld$
RewriteRule ^(.*)$ [mydomain.tld$1...] [R=301,L]
--------------------------------------------------------------

Does anyone know what I'm doing wrong here?

Cheers

gergoe

3:12 pm on Jan 8, 2008 (gmt 0)

10+ Year Member



Without going deeper into your rewrite rules, one thing grab my attention, the
RewriteCond %{HTTP_HOST}!^www\.mydomain\.tld$

line.

This says that when the host header of the http request is not of your domain name, then redirect it to your domain name. I'd suggest you to change this that it accepts empty host header too, because some (older) http implementations may not send this header, and it will cause an infinite loop of redirections. Additionally, if a browser/robot decides to append the port too in the url, then it also appears in the host header, so you have to be prepared for that as well, so I'd suggest you to change the last rule as follows:


RewriteCond %{HTTP_HOST}
!^$ 
RewriteCond %{HTTP_HOST}
!^www\.example\.com(:[0-9]+)?$ [NC] 
RewriteRule ^(.*)$ http://www.example.com$1 [R=301,L]

May not solve your problem, but will certainly make it more bulletproof.

jdMorgan

3:23 pm on Jan 8, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Your code is actually pretty nice! -- You've got efficient regex patterns, unlike 99% of most that we get to peek at here.

The problem is most likely this: In order to verify your server 404 response, google will request a non-existent URL such as "/noexist_f00bac12babecafe.html" from your server, expecting a 404 response.

The hexadecimal number is the same as they use for your account-verification URL-path, with "noexist_" replacing "google" in the URL.

However, this "noexist_" URL will be rewritten to your script by your last internal rewrite rule, and therefore is probably returning a 200-OK, 301, or 302 status instead of the expected/required 404-Not Found.

You can probably fix this by adding a RewriteCond to that last internal rewrite rule, making it.


RewriteCond $1 !^noexist$
RewriteRule ^([^_]+)_([^/]+)/?$ /index.php?$1=$2 [L]

There are several other minor problems. Your rule-sets are out of order, and there are a couple other problems that may cause you trouble. I'd suggest the following changes:


RewriteEngine on
#
[b]RewriteCond %{HTTP_HOST} .[/b]
RewriteCond %{HTTP_HOST} !^www\.example\.tld[b](:[0-9]+)?$[/b]
RewriteRule [b](.*)[/b] http://www.example.[b]tld/$1[/b] [R=301,L]
#
RewriteRule ^([^_]+)_([^/]+)/([^_]+)_([^/]+)/([^_]+)_([^/]+)/([^_]+)_([^/]+)/([^_]+)_([^/]+)/?$ /index.php?$1=$2&$3=$4&$5=$6&$7=$8&$9=$10 [L]
#
RewriteRule ^([^_]+)_([^/]+)/([^_]+)_([^/]+)/([^_]+)_([^/]+)/([^_]+)_([^/]+)/?$ /index.php?$1=$2&$3=$4&$5=$6&$7=$8 [L]
#
RewriteRule ^([^_]+)_([^/]+)/([^_]+)_([^/]+)/([^_]+)_([^/]+)/?$ /index.php?$1=$2&$3=$4&$5=$6 [L]
#
RewriteRule ^([^_]+)_([^/]+)/([^_]+)_([^/]+)/?$ /index.php?$1=$2&$3=$4 [L]
#
RewriteCond $1 !^noexist$
RewriteRule ^([^_]+)_([^/]+)/?$ /index.php?$1=$2 [L]

  • Placed external redirect rules first, followed by internal rewrite rules
  • Added a check to be sure the hostname is not blank -- to avoid an infinite redirection loop with HTTP/1.0
  • Allowed for valid port numbers appended to hostname
  • Removed redundant anchors on single 'greedy' regular expressions
  • Disabled last rule for google 404-verification requests

    Jim

  • Karma

    3:38 pm on Jan 8, 2008 (gmt 0)

    10+ Year Member



    Thanks both, great help and works a treat :)