Forum Moderators: phranque
[mysite.com...] showed up in the serps.
Any help? Thanks
current htaccess:
Options -Multiviews
RewriteEngine on
#
# Externally redirect direct client requests for "/index.htm" to "/" in
# canonical domain (This applies to /index pages in any directory)
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /([^/]+/)*index\.htm\ HTTP/
RewriteRule ^(([^/]+/)*)index\.htm$ [mysite.com...] [R=301,L]
#
# Externally redirect to remove multiple contiguous slashes at beginning or end of URL
RewriteCond %{REQUEST_URI} ^//+(.*)$ [OR]
RewriteCond %{REQUEST_URI} ^(.*/)/+$
RewriteRule / [mysite.com...] [R=301,L]
#
# Externally redirect to remove multiple contiguous slashes embedded in URL
RewriteCond %{REQUEST_URI} ^/([^/]+)//+(.*)$
RewriteRule // [mysite.com...] [R=301,L]
#
# Externally redirect non-canonical domain requests to canonical domain.
RewriteCond %{HTTP_HOST} ^mysite\.com [NC]
RewriteRule (.*) [mysite...] [R=301,L]
AddHandler server-parsed .htm
http://example.com/dir/filename.html?ref=somesite.com
Googlebot gets a 200 because the file exists and the variation isn't 404'd.
Thing is, the "?ref=" includes sites with objectionable names (read: nasty words) or from countries with reputations for site-whacking (.info, .ru, etc.). I want them 404'd forever.
Alas, this (probably wrong:) code doesn't do the trick:
RewriteCond %{REQUEST_URI} ^(.*)ref=(.*) [NC,OR]
RewriteCond %{REQUEST_URI} ^(.*)badwordhere(.*) [NC,OR]
RewriteCond %{REQUEST_URI} ^(.*)badsitehere(.*) [NC]
RewriteCond %{REQUEST_URI}!^/error\.html$
RewriteRule .* - [F]
As a simple analogy, we put our letter inside an addressed envelope, but we do not consider the contents of our letter to be part of the address that we write on the envelope. In this analogy, the URL is the "address" and the query string is the "message."
Query strings are handled separately in mod_rewrite, requiring the use of a RewriteCond. The RewriteCond can test either QUERY_STRING or THE_REQUEST, although in this case we must use THE_REQUEST to catch the case where the query string is blank, but a "?" is appended to the URL.
Here is a general fix, but be warned, it does not include loop prevention, and it redirects to remove query strings from *all* requests. It cannot be used without modification on sites using scripts to generate pages, nor can it be used as-is on sites which use custom error pages. It also implements a 301 redirect and not a 403-Forbidden response, because it is far more important to keep spurious query strings out of search engine indexes than it is to return a 403 to log spammers (they pay absolutely no attention to your server's response, because all they want is to get listed in your log file).
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /[^?]+\?
RewriteRule (.*) http://www.example.com/$1? [R=301,L]
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /([^?]*)\?
RewriteRule !\.php$ http://www.example.com/%1? [R=301,L]
Jim
If these are not important to you, then you can avoid having to research [webmasterworld.com] and modify before using... ;)
Jim