homepage Welcome to WebmasterWorld Guest from 54.197.94.241
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Code, Content, and Presentation / Apache Web Server
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL & phranque

Apache Web Server Forum

    
Query String and Duplicate content
member22




msg:4606148
 2:55 pm on Aug 30, 2013 (gmt 0)

To solve a problem of Google indexing URLs with query string and creating duplicate content, I would like to return 410 Gone for all URLs that have query string.
I would like to insert:

RewriteCond %{QUERY_STRING} .
RewriteRule ^ - [G]

Where to add this code? I am using Joomla and the problem that I am having is discussed in this Google SEO thread: [webmasterworld.com...]

Here is the copy of my .htaccess

##
# @version $Id: htaccess.txt 10492 2008-07-02 06:38:28Z ircmaxell $
# @package Joomla
# @copyright Copyright (C) 2005 - 2008 Open Source Matters. All rights reserved.
# @license http://www.gnu.org/copyleft/gpl.html GNU/GPL
# Joomla! is Free Software
##

#redirect all non www addresses to www#
RewriteEngine On
RewriteCond %{HTTP_HOST} !^www\.
RewriteRule ^(.*)$ http://www.%{HTTP_HOST}/$1 [R=301,L]

#redirect 301 /home-page to /#
Redirect 301 /home-page http://www.example.com/

#Redirect index.php to /#
Options +FollowSymLinks
DirectoryIndex index.php
RewriteEngine On
RewriteBase /
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /index\.php\ HTTP/
RewriteRule ^index\.php$ http://www.example.com/ [R=301,L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.php [L]

#RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /home-page\.html\ HTTP/
#RewriteRule ^home-page\.html$ http://www.example.com/ [R=301,L]

#####################################################
# READ THIS COMPLETELY IF YOU CHOOSE TO USE THIS FILE
#
# The line just below this section: 'Options +FollowSymLinks' may cause problems
# with some server configurations. It is required for use of mod_rewrite, but may already
# be set by your server administrator in a way that dissallows changing it in
# your .htaccess file. If using it causes your server to error out, comment it out (add # to
# beginning of line), reload your site in your browser and test your sef url's. If they work,
# it has been set by your server administrator and you do not need it set here.
#
#####################################################

## Can be commented out if causes errors, see notes above.
Options +FollowSymLinks

#
# mod_rewrite in use

RewriteEngine On
<IfModule mod_expires.c>
<FilesMatch "\.(gif|jpg|jpeg|png|swf|css|js|html?|xml|txt)$">
ExpiresActive On
ExpiresDefault "access plus 10 years"
</FilesMatch>
</IfModule>
<IfModule mod_rewrite.c>
RewriteEngine On

RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.*\.(js|css))$ smartoptimizer/?$1

<IfModule mod_expires.c>
RewriteCond %{REQUEST_FILENAME} -f
RewriteRule ^(.*\.(js|css|html?|xml|txt))$ smartoptimizer/?$1
</IfModule>

<IfModule !mod_expires.c>
RewriteCond %{REQUEST_FILENAME} -f
RewriteRule ^(.*\.(gif|jpg|jpeg|png|swf|css|js|html?|xml|txt))$ smartoptimizer/?$1
</IfModule>
</IfModule>
<FilesMatch "\.(gif|jpg|jpeg|png|swf|css|js|html?|xml|txt)$">
FileETag none
</FilesMatch>
Options +FollowSymlinks

#rewritecond %{http_host} ^example.com [nc]
#rewriterule ^(.*)$ http://www.example.com/ [r=301,nc]
########## Begin - Rewrite rules to block out some common exploits
## If you experience problems on your site block out the operations listed below
## This attempts to block the most common type of exploit `attempts` to Joomla!
#
# Block out any script trying to set a mosConfig value through the URL
RewriteCond %{QUERY_STRING} mosConfig_[a-zA-Z_]{1,21}(=|\%3D) [OR]
# Block out any script trying to base64_encode crap to send via URL
RewriteCond %{QUERY_STRING} base64_encode.*\(.*\) [OR]
# Block out any script that includes a <script> tag in URL
RewriteCond %{QUERY_STRING} (\<|%3C).*script.*(\>|%3E) [NC,OR]
# Block out any script trying to set a PHP GLOBALS variable via URL
RewriteCond %{QUERY_STRING} GLOBALS(=|\[|\%[0-9A-Z]{0,2}) [OR]
# Block out any script trying to modify a _REQUEST variable via URL
RewriteCond %{QUERY_STRING} _REQUEST(=|\[|\%[0-9A-Z]{0,2})
# Send all blocked request to homepage with 403 Forbidden error!
RewriteRule ^(.*)$ index.php [F,L]
#
########## End - Rewrite rules to block out some common exploits

# Uncomment following line if your webserver's URL
# is not directly related to physical file paths.
# Update Your Joomla! Directory (just / for root)

# RewriteBase /


########## Begin - Joomla! core SEF Section
#
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_URI} !^/index.php
RewriteCond %{REQUEST_URI} (/|\.php|\.html|\.htm|\.feed|\.pdf|\.raw|/[^.]*)$ [NC]
RewriteRule (.*) index.php
RewriteRule .* - [E=HTTP_AUTHORIZATION:%{HTTP:Authorization},L]
#
########## End - Joomla! core SEF Section//301 Redirect Old FileRedirect 301
Redirect /example.html http://www.example.com/my-example.html
Redirect /example/example/my-example.html http://www.example.com/example/my-second-example.html
Redirect /home http://www.example.com
Redirect /home.html http://www.example.com
Redirect /home-page.html http://www.example.com

[edited by: phranque at 8:54 pm (utc) on Aug 30, 2013]
[edit reason] unlinked urls [/edit]

 

member22




msg:4606189
 5:26 pm on Aug 30, 2013 (gmt 0)

I just tried to add the following line of code

RewriteCond %{QUERY_STRING} .
RewriteRule ^ - [G]

for the rewrite rule after this line

RewriteRule ^(.*)$ http://www.%{HTTP_HOST}/$1 [R=301,L]

in my.htaccess and I get a robots.txt gone when I type www.example.com/robots.txt

Here is the message I get

Gone

The requested resource
/smartoptimizer/
is no longer available on this server and there is no forwarding address. Please remove all references to this resource.

Any idea what the issue is and how to fix it ?

[edited by: phranque at 8:51 pm (utc) on Aug 30, 2013]
[edit reason] unlinked & exemplified urls [/edit]

g1smd




msg:4606228
 8:33 pm on Aug 30, 2013 (gmt 0)

First you need to get a much newer copy of the basic Joomla htaccess file. Yours is very old and contains several flaws.

Having obtained a new file, you'll then need to customise it with your various changes. However, you need to do it slightly differently.

The rules in your current file are in the wrong order. The new htaccess file available for Joomla gives hints as to where additional rules should go. Follow those directions closely.

The first section should contain rules that block access. The next section should be RewriteRules configured as redirects, with the non-www/www rule as the last one. The final section should be RewriteRules configured as internal rewrites.

The rule order is very important.

phranque




msg:4606241
 9:08 pm on Aug 30, 2013 (gmt 0)

you will also need to change all your Redirect directives to RewriteRule directives.

http://httpd.apache.org/docs/current/rewrite/avoid.html [httpd.apache.org]:
The use of RewriteRule to perform this task may be appropriate if there are other RewriteRule directives in the same scope. This is because, when there are Redirect and RewriteRule directives in the same scope, the RewriteRule directives will run first, regardless of the order of appearance in the configuration file.


running these rules in the wrong order may cause:
- exposure of mod_rewrite-rewritten urls in a subsequent mod_alias Redirect (likely what caused your /smartoptimizer/ redirect)
- multiple redirects to reach the canonical url


RewriteRule ^(.*\.(js|css))$ smartoptimizer/?$1

are you trying to pass the file name in a query string to the default directory index document for the /smartoptimizer/ directory?

<IfModule mod_expires.c>
RewriteCond %{REQUEST_FILENAME} -f
RewriteRule ^(.*\.(js|css|html?|xml|txt))$ smartoptimizer/?$1
</IfModule>

besides rarely actually requiring an IfModule container, why are you checking for mod_expires when you aren't using any mod_expires directives in that container?

lucy24




msg:4606258
 10:04 pm on Aug 30, 2013 (gmt 0)

Urk.

#1 Listen to anything g1smd says about Joomla htaccess files. He knows whereof he speaks.

#2 Get rid of any envelopes in the form <IfModule...> Not the contents! Just the envelope itself, opening and closing. Either you have the mod or you don't.

#3 As noted above, any rules using mod_alias (Redirect by that name) must be changed to use mod_rewrite (RewriteRule with [R=301] flag). If there are many of them it can be done globally in any text editor that does Regular Expressions. I only saw one or two, so you can change them manually.

#4 Functionally it doesn't matter how you arrange your htaccess. Each module will execute separately, in an order determined by the server. But for your own sanity, group each module together. Put any one-off directives such as Options or ErrorDocument at the top of the htaccess where you can easily find them.

And now

#5 Get all your RewriteRules in the right order. In general this means:

(a) group rules in order of severity. First any access-control rules ([F] flag); then any deleted pages ([G] flag); then the redirects ([R=301,L] flag); finally the simple rewrites ([L] flag only). There are exceptions, but that's the default ordering.

(b) within each of these categories, list rules from most specific to most general. Like this:

Specific redirects for specific pages:
RewriteRule ^directory/pagename.html http://www.example.com/some-other-path

and then for whole directories
RewriteRule ^directory/(.*) http://www.example.com/$1

Your last redirect will ordinarily be domain-name canonicalization (see any of several thousand earlier threads in Apache subforum), and your second-to-last will ordinarily be the "index.html" redirect (again, see earlier threads).

You are using a CMS that involves rewriting everything to index.php, so there will be some extra rules involving requests for index.php vs. requests for other pages.

Your generic htaccess probably has a number of conditions involving -f or -d. You can get rid of most of these by constraining the rule to requests for pages, for example
(^|\.html|\.php|/)$
It is very unlikely that your CMS involves itself in rewriting requests for non-page files.

Any rule that involves bad parameters will have a preceding condition that looks something like this

RewriteCond %{QUERY_STRING} (^|&)options=blahblah

where "blahblah" means any values that the "options" parameter is not allowed to have. Details will depend on exactly what the parameters are called and what their potential values are. Admittedly this is easier to do in hand-rolled php where the page itself looks at the values and decides what action to take (404, 301 or 200). But it can be done in htaccess too.

I may have overlapped phranque's post a little. But at least I kept my RewriteCond from winking.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / Apache Web Server
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved