Forum Moderators: phranque

Message Too Old, No Replies

Permanent rewrite with querystring generates 200 OK

         

frances

4:29 am on Dec 5, 2004 (gmt 0)

10+ Year Member



Hi

I recently split a site into two, and have been using .htaccess and mod_rewrite to send urls on the original site to the new one.

Urls on the original site are also rewritten in .htaccess with mod_rewrite.

The rewritten original site urls - they look like [example.com...] - are rewriting fine and generating the correct 301 response header with this:

RewriteRule ^index/([0-9]+)/dir\.htm$ [new_example.com...] [R=301,L]

The problem is with the un-rewritten urls - www.example.com/index.php?id=10&table=dir - a few of which are lurking in Google.

I have been rewriting them with

RewriteCond %{QUERY_STRING} ^id=([0-9]+)\&table=dir$
RewriteRule index\.php [example.com...] [R=301,L]

This works, but it generates a 200 OK response in the WebmasterWorld server header checker.

Is there any way I can get it to generate the 301 response?

Thanks

jdMorgan

5:44 am on Dec 5, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



frances,

There is no reason why that code --standing alone-- would generate a 200-OK response.

However, I'm a bit fuzzy on which site is which, and whether they are hosted separately with different 'accounts' of the same server, different 'accounts' on different servers, or what. The reason this matters is that something is probably interfering with your code, possibly causing it to loop. The usual suspects in this case are:

  • Trying to rewrite from static to dynamic *and* from dynamic to static in the same .htaccess file (this loops if not done properly)
  • With multiple sites sharing the same account--the same filespace, the code you use to sort out sites to subdirectories can interfere with your dynamic-to-static rewrite code.
  • Similarly, if a "Control Panel" is used to set up hosting for multiple sites under one account, it may generate code that interferes with your dynamic-to-static rewrite.

    Bear in mind that you should treat code in mod_rewrite as recursive. Assume that after it runs it will call itself, and take steps to prevent that from causing you trouble.

    Jim

  • frances

    11:27 am on Dec 5, 2004 (gmt 0)

    10+ Year Member



    sorry - i copied the second bit of code wrong.

    It should have read:

    RewriteCond %{QUERY_STRING} ^id=([0-9]+)\&table=dir$
    RewriteRule index\.php [new_example.com...] [R=301,L]

    would that cause a 200 OK response?

    Your other points:

    If static means htm, them I am only writing static to dynamic in that.htaccess.

    The sites are sharing the same account, though they are supposed to have separate IP addresses though I am not sure if that has been sorted out yet. (But if that is the problem, why do the old rewritten urls - [example.com...] - when rerewritten generate a 301 response, unlike the 200 OK response of the old unrewritten querystring urls when they are rewritten?

    I know the set up is a bit messy. I'm sorry. I really appreciate your help.

    jdMorgan

    6:16 pm on Dec 5, 2004 (gmt 0)

    WebmasterWorld Senior Member 10+ Year Member



    This code rewrites a dynamic URL to a static URL:

    RewriteCond %{QUERY_STRING} ^id=([0-9]+)&table=dir$
    RewriteRule ^index\.php$ http://www.example.com/dir/%1.htm? [R=301,L]

    It redirects index.php?id=<numbers>&table=dir to www.example.com/dir/<numbers>.htm

    As I said, there's nothing seriously wrong with that code, and some other mechanism may be interfering, resulting in a 200-OK. You might want to try a different server headers checker, to make sure that the one you're using is not simply reporting the final result after all redirections are followed.

    As far as describing your setup, the best way is to make a "map" showing which URLs are redirected, and which are not, breaking it down and annotating it as required to add details:


    example.com/xyz --------- 301 -> new_example.com/abc
    example.com/def/new ----- 301 -> new_example.com/def (redirect image file subdirectory only)
    example.com/def/(others) ------> no change
    example.com/ghi ---------------> no change
    etc.

    Jim

    frances

    9:38 pm on Dec 5, 2004 (gmt 0)

    10+ Year Member



    I have a server header checker on Mozilla Firefox on my own computer and that generates the right response when I enter www.example.com?id=123&table=xyz - 301 for www.example.com?id=123&table=xyz and 200 for www.new_example.com/xyz/123.htm.

    But the WebmasterWorld header checker just returns the 200. And I'm worried that Google is doing the same.

    I had assumed that the new site - which had been doing nicely in Google and has now sunk almost without trace - was sandboxed. But when I did a site:www.example.com search in Google, some of the urls with querystrings got a full description and a worrying "supplemental result" comment. The non-querystring version of the urls just get a single line no description, like they're on their way out.

    Does this suggest anything to you?

    Otherwise, I guess I'll just wait a while and see if google sorts it out and hope the company that is paying me will be patient.

    frances

    11:04 pm on Dec 5, 2004 (gmt 0)

    10+ Year Member



    I should have said in my previous post that I am worried the site is being penalised for duplicate content - but maybe that was obvious

    jdMorgan

    4:47 am on Dec 6, 2004 (gmt 0)

    WebmasterWorld Senior Member 10+ Year Member



    Understood.

    I have tested the response of several of my 301'ed pages using the WebmasterWorld server headers checker, and without exception, they return 301-Moved Permanently status. So, there's still a problem with some other code interfering with the redirect code you posted.

    If you have a "terminal" program, such as the HyperTerminal program that used to ship with Windows 9x (Program Files->Accessories->HyperTerminal), you can manually submit requests to your server, and be sure that there is no "program feature" interfering with the response, since it's a "dumb terminal" program.

    Enter:


    GET /example_filename HTTP/1.1<Enter key>
    User-agent: Mozilla 4.0 (compatible; HyperTerm)<Enter key>
    Host: www.example.org<Enter key><Enter key>

    Note that capitalization is critical, and you can't use backspace or delete, so type carefully!

    Jim

    frances

    10:29 am on Dec 6, 2004 (gmt 0)

    10+ Year Member



    I dont have windows 98 and I'm not sure what a terminal program is. Could you direct me to one so I could check if there is some interference?

    I have done some checking on server headers for these pages across the web.

    For the non-querystring urls (www.example.com/abc/123.htm), almost all return 301 (though there are a couple of 200s). Some return a two stage header: 301 for the original page, 200 for the new page.

    For the querystring urls(www.example.com?id=123&table=abc), they vary. Some return 200, some 301. Any that return a two stage header are identical to the non-querystring urls.

    I dont know if this suggests anything. It seems wierd to me.

    I'ld get rid of the querystring urls with 404 headers cos I dont need them for Google. But Yahoo only has the querystring urls (which it is updating) and we are doing well in Yahoo. Maybe this is the best option because Yahoo never seems to me to generate much traffic. But then ... what if the sandbox is the google problem, not duplicate content..... My head is spinning!