Forum Moderators: phranque

Message Too Old, No Replies

how to handle http://www.mydomain.com/.html error

weird mod rewrite error

         

colombo

8:51 am on Nov 7, 2008 (gmt 0)

10+ Year Member



Hi,
I see an error in my google webmasters tool about /.html

i.e
i use the mod rewrite for:
RewriteRule ^dir1/(.*)\.html somefile.php?id=$1 [L]

and the error is for the page

http://www.example.com/dir1/.html

what i get when i go to this url is 403 Forbidden page.

does any one know how to solve that issue?

Thanks for your help :)

[edited by: jdMorgan at 1:26 pm (utc) on Nov. 7, 2008]
[edit reason] example.com [/edit]

jdMorgan

1:38 pm on Nov 7, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



The first problem is that ".*" will match *anything*, including blank. Since it makes no sense to have a page named just "/.html", the rule should be changed to require at least one character before the file extension:

RewriteRule ^dir1/([^.]+)\.html$ somefile.php?id=$1 [L]

This leaves the question of "How did Google find a page named "/dir1/.html" in the first place?" I can't answer that, and you should try some "inurl:" searches to see if you can find one or more links to this "page". But you can add another rule to detect and "fix" these listings in Google's database.

For purposes of rule-ordering, this would be classed as an external redirect, and should therefore precede all of your internal rewrite rules (Place external redirect rules first in your .htaccess file, in order from most-specific-pattern to least-specific pattern, followed by your internal rewrites, again in order from most-specific-pattern to least-specific pattern).

This rule will detect "/.html" (.html file extensions without filenames), and return a 410-Gone response:


RewriteRule ^([^/]+/)*\.html$ - [G]

With this rule in place, WMT will show "40x" errors on these "/.html" URLs for awhile, but hopefully these errors will be removed from the report after awhile (weeks, months?).

If you can find any links on the Web pointing to "/.html" pages on your site, you may be able to determine where the links were intended to point. If so, you can 301-redirect these bad links to a valid page. But until it is established where the links were supposed to point, the 410 is the proper response.

It is also possible that these "/.html" URLs were "exposed" by incorrect coding or ordering of your rewrites and redirects, so read the above description of proper rule ordering carefully, and make sure that your code is ordered as described.

Jim