Forum Moderators: phranque

Message Too Old, No Replies

How to 301 query string URLs to avoid duplicate content?

I am re-writing URLs OK, but now have two URLs per page

         

lockwood77

6:18 pm on May 17, 2010 (gmt 0)

10+ Year Member



Hi, first post here and thanks already to those who have posted before - I have managed to get my .htaccess to rewrite query string URLs to friendly URLs from the information I found!

However, I would now like to ensure that the original query string URLs no longer serve content, so that there are no duplicate content issues. My current .htaccess is:


RewriteEngine on
rewritecond %{http_host} ^example\.co\.uk [nc]
rewriterule ^(.*)$ http://www.example.co.uk/$1 [r=301,nc]

RewriteRule ^friendly-url$ /index.php?page=pagename [QSA,L]


That works fine, but how do I 301 index.php?page=pagename to the friendly-url? Bearing in mind /friendly-url is not actually a file on the server of course!

Thanks in advance for your help. :)

g1smd

9:15 pm on May 17, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



It doesn't have to be a file on the server, you have a rewrite that accepts friendly URL requests and maps that to the correct internal server filepath.


I have managed to get my .htaccess to rewrite query string URLs to friendly URLs

Not quite. The rewrite maps external incoming URL requests to internal server filepaths.

jdMorgan

3:33 pm on May 18, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



This is complicated by two factors: First, you'll need one rule for each 'friendly URL' if you do this in .htaccess rather than doing it with a script that can look up the 'friendly URL' for each query-string-filepath (old URL), and second because your original code allows/supports query strings appended to the friendly URLs, making the reverse-redirect code quite a bit more complex.

However, something like this should work, if you repeat the first rule for each friendly/unfriendly pair:

RewriteEngine on
#
# Redirect direct client script filepath requests back to friendly URL
# Additional parameters [i]may[/i] precede, but none may follow "page=pagname"
RewriteCond %{THE_REQUEST} ^[A-Z]+\ /index\.php\?(([^&]*(&[^&]*)*)&)?page=pagename\ HTTP/ [OR]
# Additional parameters must precede, and others [i]may[/i] follow "page=pagename"
RewriteCond %{THE_REQUEST} ^[A-Z]+\ /index\.php\?(([^&]*(&[^&]*)*)&)page=pagename((&[^&#\ ]*)*)\ HTTP/
RewriteRule ^index\.php$ http://www.example.com/friendly-url?%2%4 [R=301,L]
#
RewriteCond %{HTTP_HOST} ^example\.com [NC]
RewriteRule ^(.*)$ http://www.example.com/$1 [R=301,L]
#
# Alternate domain canonicalization rule (much more robust,
# but does not support additional subdomains as-is)
# RewriteCond %{HTTP_HOST} !^(www\.example\.com)?$
# RewriteRule ^(.*)$ http://www.example.com/$1 [R=301,L]
#
RewriteRule ^friendly-url$ /index.php?page=pagename [QSA,L]

Here, we test THE_REQUEST to make sure that the 'unfriendly URL' is being directly-requested by the HTTP client (i.e. browser or SE robot), rather than as a result of the previously-invoked internal rewrite. This is necessary to prevent an 'infinite' rewrite/redirect loop in .htaccess.

Note that two RewriteConds are needed per 'unfriendly' URL in order to properly back-reference both preceding and trailing query string parameters without leaving leading or trailing "dangling ampersands" when the "page=pagename" query-part is removed.

I also provided an example of a common-but-more-robust domain canonicalization code snippet, which you can use if you do not plan to use subdomains in addition to the current "www" subdomain.

Jim

lockwood77

12:04 pm on May 19, 2010 (gmt 0)

10+ Year Member



Wow, that's phenomenal!

I would never have got to that under my own steam. I'm enormously grateful for that solution, it absolutely works as desired. Thanks very much. :)

jdMorgan

1:02 pm on May 19, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Two notes on expanding this code:

If you do not need to support both leading and following additional query strings, you can simplify the two-rewritecond construct in the first rule above, and should do so for efficiency.

When adding new/additional rules, follow this guideline:

  • Place access-control rules first. There is no use wasting redirects on unwelcome requests.

  • Follow access controls with all external redirects, in order from most-specific patterns and conditions (i.e. single-page redirects) to least-specific patterns and conditions (i.e. multiple-page redirects such as the domain canonicalization rule).

  • Finally, place all internal rewrites, again in order from most- to least-specific.

    This prevents unexpected rewrites and redirects caused by pattern ambiguity, prevents multiple "chained" or "stacked" redirects for a single client request, and prevents previously-internally-rewritten filepaths from being exposed as URLs to the client by subsequent redirects.

    Where two rules have equal specificity but contain mutually-exclusive patterns and conditions, their relative placement order in your file does not matter; Put the most-frequently-requested one first for a (tiny) performance advantage.

    After completing work, test, test, test. Test URLs which should NOT be rewritten or redirected, as well as those that should. Use a reliable tool such as the "Live HTTP Headers" add-on for Firefox and other Mozilla-based browsers to examine the HTTP transactions between your browser and your server to make sure that every request for a valid URL is served directly with a 200-OK, 304-Not Modified, or 206-Partial Content status. Make sure that every request for an obsolete URL results in a single 301 redirect to the correct URL (or a 410-Gone response), as desired.

    Jim
  • lockwood77

    2:21 pm on May 19, 2010 (gmt 0)

    10+ Year Member



    Excellent, thanks for the tips!

    respect

    12:04 pm on May 20, 2010 (gmt 0)

    10+ Year Member



    Hi, could you help me please?
    How to create a 301 permanent redirect from url with many dynamic parameters to url with only one dynamic parameter from therefrom?

    Now an article has canonical url: http://example.com/news/~news=39370

    But it also has three synonymous urls:

    http://example.com/news/~group__m11=329~page__n16=1~news__n16=39370
    http://example.com/news/~page__n16=1~news__n16=39370
    http://example.com/news/~news__n16=39370

    I need to redirect all 3 urls to http://example.com/news/~news=39370
    It also need to take into account that instead of 39370 it can be any other number

    [edited by: jatar_k at 12:47 pm (utc) on May 20, 2010]

    [edited by: jdMorgan at 2:40 pm (utc) on May 20, 2010]

    g1smd

    12:04 pm on May 20, 2010 (gmt 0)

    WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



    Let's see your code.

    There's tens of thousands of prior examples here in this forum for you to examine, learn from, and use, to build this code.

    respect

    10:03 am on May 21, 2010 (gmt 0)

    10+ Year Member



    For example this code does not work
    http://example.com/news/~group__m11=310~page__n16=1~news__n16=39435
    --> http://example.com/news/~group__m11=310~page__n16=1~news__n16=39435


    RewriteRule ^~group__m11=310~page__n16=1(.*)$ http://www.example.com/news/$1 [R=301,L]

    respect

    10:04 am on May 21, 2010 (gmt 0)

    10+ Year Member



    SORRY, the correct version:

    For example this code does not work
    http://example.com/news/~group__m11=310~page__n16=1~news__n16=39435
    --> http://example.com/news/~news__n16=39435


    RewriteRule ^~group__m11=310~page__n16=1(.*)$ http://www.example.com/news/$1 [R=301,L]

    jdMorgan

    3:17 pm on May 21, 2010 (gmt 0)

    WebmasterWorld Senior Member 10+ Year Member



    This code should work if it is located in example.com/news/.htaccess and if mod_userdir is disabled.

    To be clear, Apache mod_userdir implements a simple "multi-user shared hosting" set-up (often used at universities, etc.) that defines the name following the first "~" in a requested URL-path as a "user" and gives each user a separate "private" filespace under that name. If it is enabled, it will likely severely interfere with your site's operation, since you've used that same "~" character for something else.

    If this code is located in example.com/.htaccess instead of example.com/news/.htaccess, then you will need to put "news/" into your RewriteRule pattern.

    Jim

    respect

    7:49 am on May 22, 2010 (gmt 0)

    10+ Year Member



    Thank you for you answer.
    This code is located in example.com/.htaccess, because news/ is virtual folder.
    Could you explain me how to put "news/" into my RewriteRule ?

    jdMorgan

    7:13 pm on May 22, 2010 (gmt 0)

    WebmasterWorld Senior Member 10+ Year Member



    I really think you should try it and test it yourself, instead of waiting around here all day for answers that can easily be found at apache.org...

    The RewriteRule pattern must match the current-.htaccess-file-localized URL-path.

    Jim