Forum Moderators: phranque

Message Too Old, No Replies

301 // and /?useless_query to /

And a few others :S

         

LunaC

7:07 pm on Nov 11, 2005 (gmt 0)

10+ Year Member



I've searched and can't find the exact code to solve my current problem.

G. is crawling and indexing (thanks to people linking to me like this) and giving me a duplicate content penalty:
example.com// (notice double slashes, server responds code 200 anyway)
example.com/?nonsense_query_string_that_never_existed
example.com/page.php/
example.com/index.php

What I need to do is 301 all those to the propper pages, ie:example.com, example.com/page.php without a trailing slash and all without upsetting the major search engines by having multiple redirects and without harming my sites search section that has a url like:

http://example.com/cgi-bin/search/search.pl?p%3Apm=1&Terms=searchterm

There's many old pages I moved a few months ago, and I've redirected www to non www so I already have this in the .htaccess:


Options +FollowSymLinks
RewriteEngine on
rewriterule ^oldpage\.shtml$ http://example.com/folder/newpage.php [R=301,L]
rewriterule ^oldpage2\.shtml$ http://example.com/folder2/newpage2.php [R=301,L]
RewriteCond %{HTTP_HOST} .
RewriteCond %{HTTP_HOST}!^example\.com
RewriteRule (.*) http://example.com/$1 [R=301,L]

.htacces frightens me at the best of times, this one is so far over my head I can't even see it, any help is appreciated.

jdMorgan

7:11 pm on Nov 11, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Do you use any query string anywhere on your site?

If so, you'll need to specifically identify under what circumstances a query string is allowed or not allowed.

Jim

jdMorgan

7:24 pm on Nov 11, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



In the interest of preventing a /. of these threads from the G forum, here's a beastiary of .htaccess mod_rewrite solutions to G indexing problems:

# Remove multiple slashes anywhere in URL (less efficient than next rule)
RewriteCond %{REQUEST_URI} ^(.*)//(.*)$
RewriteRule . http:example.com%1/%2 [R=301,L]
#
# Remove multiple slashes after domain (more efficient, but not for use in httpd.conf):
RewriteRule ^/(.*)$ http:example.com/$1 [R=301,L]
#
# Remove query strings on *all* requests:
RewriteCond %{QUERY_STRING} .
RewriteRule (.*) http:example.com/$1? [R=301,L]
#
# Remove trailing slash if filetype present in URL
RewriteRule ^(.+\.[^/]+)/$ http:example.com/$1 [R=301,L]

Jim

Peter

7:59 pm on Nov 11, 2005 (gmt 0)

10+ Year Member



Hello,

A little while ago, after my (shared) server moved up to Apache 2, I found that testing:

RewriteCond %{QUERY_STRING} .

no longer worked against "/foo.html?" ie. an empty query string.

Since then I've used:

RewriteCond %{THE_REQUEST} [?]
RewriteRule ^(.*)$ ht*p://www.mysite.net/$1? [R=301,L]

Peter.

jdMorgan

8:08 pm on Nov 11, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Yes, good point. The question mark is not part of either the query string or the URL, and so is only 'visible' to mod_rewrite as part of THE_REQUEST.

Thanks for posting!

Jim

LunaC

8:27 pm on Nov 11, 2005 (gmt 0)

10+ Year Member



OK, before I test it live, how does this look:
Options +FollowSymLinks
RewriteEngine on
rewriterule ^oldpage1\.shtml$ http://example.com/folder1/newpage1.php [R=301,L]
rewriterule ^oldpage2\.shtml$ http://example.com/folder2/newpage2.php [R=301,L]
#
# Remove multiple slashes anywhere in URL (less efficient than other rule but works on inner folders?)
RewriteCond %{REQUEST_URI} ^(.*)//(.*)$
RewriteRule . http:example.com%1/%2 [R=301,L]
# Remove trailing slash if filetype present in URL
RewriteRule ^(.+\.[^/]+)/$ http:example.com/$1 [R=301,L]
#
# Remove query strings on *all* requests (will this kill my site search?)
RewriteCond %{THE_REQUEST} [?]
RewriteRule ^(.*)$ http://example.com/$1? [R=301,L]
#
# Redirect all www to non www
RewriteCond %{HTTP_HOST} .
RewriteCond %{HTTP_HOST}!^example\.com
RewriteRule (.*) http://example.com/$1 [R=301,L]

Any visible errors?
Is that the right order so there won't be multiple redirects? ie www.example.com/?stuff goes in 1 step to example.com/ etc.

Also, the site search uses a? in the url:
http://example.com/cgi-bin/search/search.pl?p%3Apm=1&Terms=searchterm

Also outgoing links look like:
http://example.com/l/go.php?id=333
http://example.com/link/?o=id

Will the removal of all query strings kill those or just ones after example.com/? If it does kill them, is there a way to allow it in specific folders or something?

jdMorgan

8:50 pm on Nov 11, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Remove query strings on *all* requests

You will need to add an exception to the rule for any pages that require query strings.

Jim

LunaC

9:49 pm on Nov 11, 2005 (gmt 0)

10+ Year Member



OK, I was afraid *all* meant absolutley everything.

Again, I searched, tried a few things and ony succeeded in causing server errors, how would I write those exeptions?

jdMorgan

10:56 pm on Nov 11, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



To exclude your search page, add:

RewriteCond %{REQUEST_URI} !^/cgi-bin/search/search\.pl$

Jim

LunaC

12:13 am on Nov 12, 2005 (gmt 0)

10+ Year Member



Thank you so much for your help. .htaccess and it's often silent errors scare the heck out of me. The last thing I need right now is to send these poor spiders on a wild goose chase :S

To be sure I understand, add this in here like this you mean, right?

RewriteCond %{THE_REQUEST} [?]
RewriteCond %{REQUEST_URI}!^/cgi-bin/search/search\.pl$
RewriteCond %{REQUEST_URI}!^/link/$
RewriteCond %{REQUEST_URI}!^/l/go\.php$
RewriteCond %{REQUEST_URI}!^/l/admin/search\.php$
RewriteCond %{REQUEST_URI}!^/l/admin/view_cat\.php$
RewriteCond %{REQUEST_URI}!^/l/admin/view_stats\.php$
RewriteRule ^(.*)$ http://example.com/$1? [R=301,L]

Is something like this allowed, just to let the entire script in the /l/ folder use strings, or is it necessary to go through each page in the admin? (There are quite a few, 50 or so I'm guessing from a glance, so I want to be as efficient as possible for my server sake.)

RewriteCond %{REQUEST_URI}!^/l/*\.php$

If this is possible, is that how it's written? How, if that's not quite it?

Again, thank you so much for helping.

jdMorgan

12:55 am on Nov 12, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I'd recommend:

RewriteCond %{REQUEST_URI} !^/l/[^.]+\.php$

Jim

LunaC

1:18 am on Nov 13, 2005 (gmt 0)

10+ Year Member



Can you tell me where I went wrong on this exclusion line?
RewriteCond %{REQUEST_URI}!^/link/$

The address that needs to work looks like http://example.com/link/?o=id

So far the rest is working, (oddly the search is working fine even without an exclusion rule) but after hours of messing with this line I can't see what's wrong. I've tried with and without the space before!^ and a bunch of other, more-than-likely newbish, variations.

Finally got index.php redirecting properly (I think) to / as well =)

jdMorgan

3:24 am on Nov 13, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



The space before "!" is absolutely required. Without it, you should be getting 500-Server Error, assuming that you flushed your browser cache (as you should) before testing any change to your access-control code.

Your RewriteCond pattern is correct and should work for the example URL you provided, so by circular argument, I have to ask: Did you flush your browser cache before testing?

Jim

LunaC

6:00 am on Nov 13, 2005 (gmt 0)

10+ Year Member



Yup, the space is there (& no 500 error), cleared the cache a few times, tried 3 different browsers all cleared in case something stuck.

The headers show it 301 -> example.com/link/index.php then 404's since no string info was carried over.

/l/ and /search/ are working perfectly.

jdMorgan

10:00 pm on Nov 13, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



No idea, unless you've got an Alias directive in httpd.conf that is re-mapping requests for /link/ to some other directory. In this case your rules would never be run, since they would not be in the directory-path for those requests.

Jim