homepage Welcome to WebmasterWorld Guest from 107.22.45.61
register, free tools, login, search, subscribe, help, library, announcements, recent posts, open posts,
Subscribe and Support WebmasterWorld
Home / Forums Index / Google / Google SEO News and Discussion
Forum Library, Charter, Moderators: Robert Charlton & aakk9999 & brotherhood of lan & goodroi

Google SEO News and Discussion Forum

    
Rewriting Dynamic URLs w/o Having Duplicate Content
Rewriting Dynamic URLs w/o Having Duplicate Content
bman68




msg:3316461
 3:49 am on Apr 20, 2007 (gmt 0)

Hey guys, first off all of this is still new to me so I appologize if I'm not using the correct jargin. I will do my best to make my question as clear as possible. Basically all of my URLs are currently looking like this: http://example.com/index.php?page=dir/file ..I had just begun learning about mod_rewrite so that my URLs can appear static such as http://example.com/directory/file. The concern I'm having is that google will see the new static url as a new page and read it as duplicate content to the dynamic URL. How do I use mod_rewrite without running this risk? I appreciate any help you guys can provide me.

Thanks.

[edited by: tedster at 2:40 pm (utc) on April 20, 2007]
[edit reason] switch to example.com - it will never be owned [/edit]

 

tedster




msg:3316856
 2:50 pm on Apr 20, 2007 (gmt 0)

Hello bman68, and welcome to the forums.

You're right to have this issue in mind - and during the transition from your old urls being indexed to the new ones being indexed, both will be there for a period of time. It's unavoidable.

There's a good thread from back in January called URL structure redesign [webmasterworld.com] where jdMorgan gave this excellent approach:

Doing things in the right order and at the right time is everything:
  • Add code to rewrite (not redirect) the new friendly URLs to the old unfriendly ones needed by your script(s).
  • Change the links on your pages to use those new friendly URLs.
  • Get your responsive linking partners to link to the new friendly URLs.
  • Let this sit awhile, until you see the new URLs appear consistently in the SERPs for important pages.
  • Add code to 301 (permanently) redirect the unfriendly URLs to the friendly ones to handle non-updated inbound links.

    Don't take exceptional measures to do this fast or all at once, or you can "pull the rug out from under your site" in search. Proceed slowly and very deliberately with regard to your top-ranking pages and main landing pages.

    Someone here (I wish I could remember who, so as to give credit) has argued that starting with updating the links on your lowest-level, least-important pages (at step 2 in the list above) is a good plan, and I tend to agree -- build new internal supports for your top pages before removing the old supports. On a per-page basis, consider this a balancing act between maintaining the PageRank/link-pop support for a page, and avoiding long-term duplicate (old & new) URLs for the same page. This should work well for sites with a small number of well-ranked landing pages, and lots of supporting pages below -- for example, an e-commerce site with a few "main" pages and categories, and lots of product pages below that.

    [webmasterworld.com...]

  • For technical issues with the url rewriting itself, that's an Apache Forum matter. Here's an excellent primer on the topic: Changing Dynamic URLs to Static URLs [webmasterworld.com]

    g1smd




    msg:3316859
     2:55 pm on Apr 20, 2007 (gmt 0)

    Just to underline the fact that the solution has both a redirect and a rewrite in it. Whilst similar, you need to understand the important differences between them.

    The redirect generates a 301 code back to the browser. You see the URL change in the address bar of the browser.

    The rewrite internally rewrites the URL to fetch the correct content, but doesn't show that rewritten URL. You see the original URL you requested.

    If you request example.com/directory/file the the server actually pulls the data from example.com/index.php?page=dir/file but doesn't show you that internally rewritten URL.

    If you request example.com/index.php?page=dir/file then the server issues a 301 redirect to example.com/directory/file and you see the URL in the browser change to be that one. The server then uses the internal rewrite (as above) to get you the content.

    Because one of them is a rewrite there is no possibility of there being a loop. If both were redirects then it would always loop forever.

    kevstor




    msg:3318760
     10:13 am on Apr 23, 2007 (gmt 0)

    We have a client with a big site and recommended implementing rewrites to build better urls.

    the big problem is that when google picked up the mods traffic dropped off by 70%, this was 2 months ago and it still hasnt recovered. We did not realise when they first set up the mods, no 301s were put in place. These were set up about two weeks ago, problem being that for 5 - 6 weeks both old and new urls were available.

    I dont think dup. content is a problem but we are concerned as to why google isnt happy with the new urls. Sounds like we pretty much did what you recommend above but maybe a bit late with the 301s?

    yahoo and msn picked up the mods quickly.

    also set up an xml sitemap and submitted but doesnt seem to have made any difference.

    Any help appreciated.

    <Sorry, no specifics.
    See Forum Charter [webmasterworld.com]>

    [edited by: tedster at 4:14 pm (utc) on April 23, 2007]

    jdMorgan




    msg:3318824
     12:56 pm on Apr 23, 2007 (gmt 0)

    > Because one of them is a rewrite there is no possibility of there being a loop. If both were redirects then it would always loop forever.

    There is practically a guarantee of a loop, even in the case of a rewrite and a redirect, unless the code that implements the external redirect examines the server variable THE_REQUEST before redirecting. See the threads cited above for details of the correct implementation to avoid this problem.

    The same problem can occur when a redirect and a DirectoryIndex directive conflict, and the solution is identical: Do not redirect unless the 'incorrect' URL was received from the client (browser), rather than generated as the result of a server directive (internal rewrite or DirectoryIndex).

    > Sounds like we pretty much did what you recommend above but maybe a bit late with the 301s?

    If the roof of my house is on fire, I would prefer that the firemen extinguish it before it spreads to the whole house... even if they arrive late on the scene. No matter how you do it, changing URLs is likely to result in a temporary impact on the ranking of pages on your site. But you must balance the long-term gain against this short-term pain. And as to being late with the redirects, the old phrase, "Better late than never" applies.

    Jim

    kevstor




    msg:3318851
     1:49 pm on Apr 23, 2007 (gmt 0)

    thx Jim - i guess its a case of sit tight and wait for G to do its stuff, not used to waiting this long!

    Simsi




    msg:3318988
     4:07 pm on Apr 23, 2007 (gmt 0)

    Just one question on this: I have recently done a similar thing, but I populate a drop-down box of the 150 or so products which, being in a form bviously, calls the script to drag them from the DB. Will Google see this and index it alongside the new URLs you think?

    g1smd




    msg:3319154
     6:42 pm on Apr 23, 2007 (gmt 0)

    You'll see the old, now redirected, URLs become Supplemental Results when Google starts to figure out what you are doing.

    Don't panic when that happens. It is normal. They may continue to show like that for a year.

    Your measure of success is in seeing that the new URLs are not Supplemental and that as many as possible are indexed.

    nealw




    msg:3325516
     6:29 pm on Apr 29, 2007 (gmt 0)

    To avoid duplicate content using mod rewrite I do 3 things.

    1.) Add the dynamic url to my robots exclusion list. This stops the url from being followed or indexed.

    User-agent: *
    Disallow: /cgi-bin/mydir/mypage.cgi

    or if you rewrite all URL's exclude all dynamic urls.

    User-agent: *
    Disallow: /cgi-bin/

    2.) I add a variable (isstatic) to the rewrite, then do a check in the code to see if the variable is 'yes'. If it is yes then I know the url was invoked from .htaccess.

    Rewrite the dynamic url in .htaccess:

    RewriteEngine on
    RewriteBase /cgi-bin/mydir/
    RewriteRule ^(.+)\.html$ mypage.cgi?action=myaction&id=$1&isstatic=yes

    Static URL:
    www.mysite.com/content/13.html

    3.) I use perl/cgi so in my .cgi file I run a check at the top of the file or sub-routine looking for the "isstatic" variable. If I don't find it I include the noindex meta tag.

    if($query{'isstatic'} eq "yes"){
    #If invoked from .htaccess / static url.
    $noindex = "";
    }
    else {
    #If invoked directly via mypage.cgi script.
    $noindex = qq~
    <META NAME="ROBOTS" CONTENT="NOINDEX">
    ~;
    }

    print <<META;
    <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
    <html>
    <head>

    $noindex

    <meta http-equiv="Content-Language" content="en-us">
    <title>#*$!</title>
    <META NAME="description" CONTENT="#*$!">
    <META NAME="keywords" CONTENT="#*$!">

    META

    ...rest of the cgi code

    Hope this helps.

    Neal

    jd01




    msg:3325563
     7:43 pm on Apr 29, 2007 (gmt 0)

    The only alternate suggestion I can make (to avoid duplication of content for a short period of time) is to set the redirect immediately from the old locations to the new, but leave the links running through the redirect until the new locations replace the old in the SERPs.

    Justin

    This is one of the few areas where jdMorgan and I do not totally agree, but I assume he has tested what he is suggesting, as I have tested what I am suggesting, so it may be more a matter of preference than correct v. incorrect. When properly redirected URLs should only have a very short period of time where they drop in rankings.

    steve127




    msg:3329343
     12:28 am on May 3, 2007 (gmt 0)

    I have set up ISAPI-rewrite tool to handle redirecting my pages from HTM to ASP. So for example the old URL was http://www.example.com/default.htm and I wanted it to switch to http://www.example.com/default.asp

    Right now, it does so. The search engines display the HTM in most cases on a keyword search and when you click on he link it automatically transfers to the .asp pages. The URL still remains the same in the browser after the new page comes up.

    When I do Header check, the results say the page is sending a 200 code.

    My questions are the following:

    Does it need to still send a 301 code. If so how do we change it? Is Google penalizing us for setting up the ISAPI the way it currently is? Must it be changed to produce a 301 redirect, if so how do we do it? Our host set up the ISAPi for us and not sure how they set it up.

    As far as functionality it is performing the way it needs to but I am concerned with how Google is viewing it and if it is acceptable by them? The ASP pages are indexed. Our search positioning had dropped before we made the switch but it has not improved since we made the switch about 4 weeks ago.

    Is ISAPI a good way to set it up and within ISAPI what is the best way that Google will approve of?
    Any help clarifying this would be appreciated.

    Thanks.

    Steve

    [edited by: tedster at 1:10 am (utc) on May 3, 2007]
    [edit reason] change to example.com - it will never be owned [/edit]

    g1smd




    msg:3329351
     12:53 am on May 3, 2007 (gmt 0)

    I would set the server up so that you can still use .htm filenames but the server knows to process them as ASP scripts.

    That way you have no URL changes at all.

    If you really must have new names, and I recommend that you do not have new names, then you MUST set up a 301 redirect from old to new.

    steve127




    msg:3329925
     4:30 pm on May 3, 2007 (gmt 0)

    think that is what it is already doing. It goes from HTM to ASP but it keeps the HTM URL. However, tehre are two files existing, an HTM and a ASP with duplicate content. Both pages are giving me 200 codes. Is this okm then? Does it need to give a 301? How do we set this up properly in ISAPI. We are currently usign ISAPI but maybe it is not set up be Google freindly? Are we ok with what we have?

    g1smd




    msg:3330208
     10:09 pm on May 3, 2007 (gmt 0)

    If you have an internal rewrite from htm to asp then you also need an external redirect from asp to htm.

    This is very easy to set up in Apache, but I have no clue how you would do that in IIS or ISAPI.

    Global Options:
     top home search open messages active posts  
     

    Home / Forums Index / Google / Google SEO News and Discussion
    rss feed

    All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
    Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About
    © Webmaster World 1996-2014 all rights reserved