Forum Moderators: phranque

Message Too Old, No Replies

Thought We had a good htaccess file.but

Then I saw this entry show up in Serps

         

DannyTweb

12:29 pm on Jan 19, 2008 (gmt 0)

10+ Year Member



I though our htaccess file was written correctly but then

[mysite.com...] showed up in the serps.

Any help? Thanks

current htaccess:

Options -Multiviews

RewriteEngine on
#
# Externally redirect direct client requests for "/index.htm" to "/" in
# canonical domain (This applies to /index pages in any directory)
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /([^/]+/)*index\.htm\ HTTP/
RewriteRule ^(([^/]+/)*)index\.htm$ [mysite.com...] [R=301,L]
#
# Externally redirect to remove multiple contiguous slashes at beginning or end of URL
RewriteCond %{REQUEST_URI} ^//+(.*)$ [OR]
RewriteCond %{REQUEST_URI} ^(.*/)/+$
RewriteRule / [mysite.com...] [R=301,L]
#
# Externally redirect to remove multiple contiguous slashes embedded in URL
RewriteCond %{REQUEST_URI} ^/([^/]+)//+(.*)$
RewriteRule // [mysite.com...] [R=301,L]
#
# Externally redirect non-canonical domain requests to canonical domain.
RewriteCond %{HTTP_HOST} ^mysite\.com [NC]
RewriteRule (.*) [mysite...] [R=301,L]

AddHandler server-parsed .htm

Pfui

5:08 pm on Jan 19, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I hope there's a fix for these question-mark oddities, too. Mine are usually hit only by Googlebot and they look like:

http://example.com/dir/filename.html?ref=somesite.com

Googlebot gets a 200 because the file exists and the variation isn't 404'd.

Thing is, the "?ref=" includes sites with objectionable names (read: nasty words) or from countries with reputations for site-whacking (.info, .ru, etc.). I want them 404'd forever.

Alas, this (probably wrong:) code doesn't do the trick:


RewriteCond %{REQUEST_URI} ^(.*)ref=(.*) [NC,OR]
RewriteCond %{REQUEST_URI} ^(.*)badwordhere(.*) [NC,OR]
RewriteCond %{REQUEST_URI} ^(.*)badsitehere(.*) [NC]
RewriteCond %{REQUEST_URI}!^/error\.html$
RewriteRule .* - [F]

jdMorgan

6:06 pm on Jan 19, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Query strings are not part of REQUEST_URI, because they are not part of the URI (or URL) -- They are data to be passed to the resource at that URI (normally a script). To be clear, a script has a "location" on the web -- a URL or URI, but the data passed to it does not.

As a simple analogy, we put our letter inside an addressed envelope, but we do not consider the contents of our letter to be part of the address that we write on the envelope. In this analogy, the URL is the "address" and the query string is the "message."

Query strings are handled separately in mod_rewrite, requiring the use of a RewriteCond. The RewriteCond can test either QUERY_STRING or THE_REQUEST, although in this case we must use THE_REQUEST to catch the case where the query string is blank, but a "?" is appended to the URL.

Here is a general fix, but be warned, it does not include loop prevention, and it redirects to remove query strings from *all* requests. It cannot be used without modification on sites using scripts to generate pages, nor can it be used as-is on sites which use custom error pages. It also implements a 301 redirect and not a 403-Forbidden response, because it is far more important to keep spurious query strings out of search engine indexes than it is to return a 403 to log spammers (they pay absolutely no attention to your server's response, because all they want is to get listed in your log file).


RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /[^?]+\?
RewriteRule (.*) http://www.example.com/$1? [R=301,L]

For sites which mix dynamic and static pages, the following can be used to remove query strings from everything except direct client requests for .php files:

RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /([^?]*)\?
RewriteRule !\.php$ http://www.example.com/%1? [R=301,L]

Tweak to suit! :)

Jim

DannyTweb

6:35 pm on Jan 19, 2008 (gmt 0)

10+ Year Member



Oh boy,
Thanks, but I just know if I start tweaking
what I posted I am gonna mess somethin else up since
I am not sure what I am doing in the first place as I had
just copied my htaccess file from somewhere else.

jdMorgan

7:06 pm on Jan 19, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Ah, well... This is the place to *learn* how to configure Apache. It's not terribly difficult, but neither is it trivial. I can't recommend cutting and pasting code from *any forum* without understanding exactly what it does to your server configuration -- and more to the point, what using it might do to your search engine rankings, traffic, income, etc.

If these are not important to you, then you can avoid having to research [webmasterworld.com] and modify before using... ;)

Jim