Forum Moderators: phranque

Message Too Old, No Replies

SEO - Google killing my site because of URL rewrite?

         

Buster13

3:11 pm on Nov 29, 2009 (gmt 0)

10+ Year Member



2 weeks ago, I had re-code my 1 year old website from dynamic to static URLs.
now when I google my web site example.com, the search results are half dynamic and half static for different webpages

for eg, google-ing "example tshirts"

example.com/shop/mens/t-shirts/44/1185/ (new static URL Ok)
example.com/shop/index.php?c=44 (should have been example.com/shop/t-shirts/)

The above are not similar web pages meaning they are NOT duplicate contents, however my concern is, my website is already 100% rewrite static URLs, but after 2 weeks, why google only update half half?

Btw, my web traffic dropped like 30% last week T.T
(not sure if this is due to i redirect to 404 for all old invalid dynamic URLs)

I also want to mention, i used sitemap software and crawl my website, and the NEW sitemap.xml does NOT contain any dynamic URLs. (in case, someone suspects i did not re-code my URLs correctly ;p)

[edited by: jdMorgan at 1:22 pm (utc) on Nov. 30, 2009]
[edit reason] example.com [/edit]

g1smd

3:35 pm on Nov 29, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Do the internal links on the pages of the site all point to the new URLs, i.e. none of the on-page navigation leads the user to any sort of redirect? Use Xenu Linksleuth to check this out.

If a user requests an old format URL, do they immediately see content, or are they redirected to a different URL, or are they shown a 404 error? Use Live HTTP Headers to see the actual server response codes. There should be a single-step 301 redirect in place.

If a user requests a new URL, is that served with a "200 OK" response code, and without any preceding redirects? Use Live HTTP Headers to verify this is exactly what happens.

Buster13

3:02 am on Nov 30, 2009 (gmt 0)

10+ Year Member



I like to Thank You for replying to my post, sir! I installed Xenu as per your advice and below URL is the generated report.
http://www.example.com/httpheaders/xenu-report.htm

I edited some minor errors such as,
../images/x.gig TO /images/x.gif
http://example.com TO http://www.example.com
Other than the above, I don't see another problems from the Xenu report.

I also installed httpliveheaders add-on for Firefox and have randomly captured 5 webpages for your review,

http://www.example.com/httpheaders/forhe-t-shirts.txt
http://www.example.com/httpheaders/forhe-tops.txt

http://www.example.com/httpheaders/forhe-t-shirts-inbox.txt

http://www.example.com/httpheaders/forhe-t-shirts-inbox-1187.txt
http://www.example.com/httpheaders/forhe-tops-2rabbits-1053.txt

As you can see, they served with "200 OK" response code as what you had mentioned.

However, when I google "lleitmotif tops" i found
TOPS ¦ FOR HE ¦ Online Shopping Wholesale Clothing - example.com
example.com/shop/index.php?c=11

I just don't understand why google don't want to update such links after 2.5 weeks!

[edited by: jdMorgan at 1:24 pm (utc) on Nov. 30, 2009]
[edit reason] example.com [/edit]

jdMorgan

1:48 pm on Nov 30, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Please note that all URLs in this thread have been changed to example.com to comport with our Terms of Service [webmasterworld.com], and to protect your site's listings in search results (It's doubtful that you want any of the top results for a search on your domain to be a link to this thread, especially since it points out vulnerabilities which may be exploitable by your competition).

In addition to checking the "new good" URLs, you also need to check the "old bad" ones, and verify that the response for an old URL is either a 301 redirect to a very-relevant replacement page, a 410-Gone response, or a 404-Not Found -- and that only one response is involved; That is, if you request an old URL and see first a 302 redirect to a different URL, followed by a 404 or 410 at that URL, then that is a big problem.

At a higher level, if you have not redirected your old URLs to your new URLs, then there is no reason to expect that Google will update their index of your site in only a few weeks. Looking at the threads in our Google search forum should make it clear that although Google returns thousands of search results in less than one second, it often takes them a long time to index new sites or to update their index of old sites when URLs are changed; A realistic expectation in the case of massive URLs changes would be on the order of two to nine months, not 2.5 weeks.

This fact should enter into the decision to change your URLs, and more importantly, it should illustrate the value of developing a "URL system" that is well-planned, flexible to allow for growth in the site and changes in its focus, and that therefore will never need to be changed again for any reason.

If you view the Web as search engines do --more like a library than a street-corner magazine/newspaper stand-- then it's easy to realize that they "do not like it" when you enter their library and start changing the titles on all of the books on their shelves that you have written. Since they must also continue to add millions of new books to this library every day, you can understand that it may take them quite a long time to correct their book catalog because *you* have caused them to have to do all this extra work.

Either implement 301 redirects for your top 100 most-important old URLs to speed things up, or wait six months and then check again... In the meantime, work on new content and on improving your backlink profile. And be very sure that your new URL-scheme is well-planned for the future and won't have to change again [w3.org].

For anyone else reading this thread who may be considering a similar change, it's generally recommended that you make such changes a few months before the time of year when your traffic is lowest, and not just before the holiday shopping season.

Jim

Buster13

8:35 am on Dec 2, 2009 (gmt 0)

10+ Year Member



[In addition to checking the "new good" URLs, you also need to check the "old bad" ones, and verify that the response for an old URL is either a 301 redirect to a very-relevant replacement page, a 410-Gone response, or a 404-Not Found -- and that only one response is involved; That is, if you request an old URL and see first a 302 redirect to a different URL, followed by a 404 or 410 at that URL, then that is a big problem.]

So in short, for my PHP shopping cart, you mean

Someones enters static -> i rewrite to dynamic(i done it)
example.com/shop/mens/ -> example.com/shop/index.php?c=3

Someones enters dynamic -> i NEED to redirect to static (i did NOT do)
example.com/shop/index.php?c=3 -> example.com/shop/mens/

At present, if someone enters "example.com/shop/index.php?c=3" on my website, the webpage will load as per normal (HTTP/1.x 200 OK) and the URL will remain as per "example.com/shop/index.php?c=3" on browser.

Correct me if I'm wrong, i should add 2nd set of rule to redirect
example.com/shop/index.php?c=3 -> example.com/shop/mens/ [301, L]

Is there any pros eg. If I set [410, L] for old dynamic URLs so that Google kill them faster and update my NEW static URLs rather than [301, L]

g1smd

9:36 am on Dec 2, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Yes, you need some extra rules to redirect people to the new URL if they ask for the old URL.

That is, your old URLs currently return "200 OK" so there is no way for any agent to know they are "old" URLs. The 301 redirect would tell them to update their listings to use the new URLs.

You could send "410 gone" for the old URLs, but you need to consider several important points. Google hasn't fully indexed and ranked all your new URLs yet. That process might take several months, so the new URLs might not be able to bring a lot of traffic for a while. It's the old URLs that are bringing the traffic, and those may take a few months to delist. Where the old URLs rank, they still bring traffic to your site. The last thing you want to do to those visitors is lead them to a dead end. Also, suddenly delisting all your old URLs when the new URLs aren't yet fully indexed and ranking will certainly lead to a huge fall in traffic. This is why the redirect is preferred. Google will index the new URLs at whatever rate they happen to work at, and will delist old URLs over some period of time. Visitors will see a mix of old and new in the SERPs. Visitors still get to the right content by clicking any link, and most won't even notice the URL change in the browser URL bar if it is an old URL. Over time the old URLs will be replaced with new URLs in listings.

The combination of links on your pages now pointing to the new URLs, and requests for old URLs being redirected to the new URLs will ensure that bots and users can access all your content while being prompted to prefer the new URLs. The redirect allows this to happen at whatever rate Google wants to work at. Use Google WebmasterTools to monitor progress and look for problems.

One caveat. Make sure that any redirect rules take the user from old to new URL in a single step. Check this happens for both www and non-www requests. Avoid any sort of multiple-step redirection chain as this will cause the update process to stall, as well as fail to pass on any benefits from incoming links pointing at old URLs.

Over time, make sure you monitor where your external incoming links come from, contact those other sites and ask them to update their links to point to the new URLs. If they run regular outgoing link reports for their sites, they will spot the 301 that your site sends and update their links anyway. There's a danger that if you simply return 404 or 410 that they simply delete the link to your site when they spot the error in their report.

The redirect is preferred. Using .htaccess you should trap external incoming requests for old URLs, and rewrite them to a script that looks up the new URL in the database. The script then sends a 301 HTTP header and the new URL. You could handle the whole thing by adding lots of redirects to your .htaccess file but it will be a lot of work to keep everything up to date. If you fail to redirect a parameter-driven URL it will appear as Duplicate Content. You're better off letting your database handle this automatically.

Buster13

10:23 am on Dec 2, 2009 (gmt 0)

10+ Year Member



[ You could handle the whole thing by adding lots of redirects to your .htaccess file but it will be a lot of work to keep everything up to date. If you fail to redirect a parameter-driven URL it will appear as Duplicate Content. You're better off letting your database handle this automatically. ]

I agree with you. I wouldn't want to re-create another set of URLs just to handle old dynamic URLs on request. It will just make my already lengthy htaccess even longer!

My solution is, if someone entered old dynamic URL,
example.com/shop/index.php?c=13&p=1111

I will add PHP codes to index.php such that it

- use HTTP REQUEST URI to check presence of c= & p=
- if present, means old dynamic URL
- if above TRUE, takes value of parameters 'c' and 'p' and re-create new static URL. Thereafter i execute
header('Location: http://example.com/shop/(new static URL)');

- if NOT TRUE, do nothing (for someone entering new static URL) which is redirected by htaccess properly already

---
This way, it saves me from human re-direct errors and my website index.php handles automatically. Is my solution ok?

g1smd

12:31 pm on Dec 2, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Almost. Your solution produces a 302 redirect.

You need an additional HEADER before the one with the URL. The extra HEADER sends "Status: 301".

Refer to the PHP documentation for the exact syntax.

But yes, you have the essence of the deal. Make sure that the redirect headers are the very first thing to be sent out (way before any HTML, if any). Verify correct operation using Live HTTP Headers.

.

I would handle it very slightly differently. I would add a rule at the start of the .htaccess file, one that 'detects' the c= and p= parameters, and rewrites (that's rewrite not redirect) those requests to a brand new /my-custom-redirects.php script which would do all the work. In that way, the new redirect script will never be affected by any changes or upgrades to the main site scripting files.

That's a minor implementation detail, but it keeps "my code" away from "their code".

Buster13

12:47 pm on Dec 2, 2009 (gmt 0)

10+ Year Member



[ I would add a rule at the start of the .htaccess file, one that 'detects' the c= and p= parameters ]

Any examples on this part? I am kind of loss. I figured it to be,

http://example.com/shop/index.php?c=13&p=6821

RewriteEngine On
RewriteBase /
RewriteCond %{QUERY_STRING} c= p=
RewriteRule ^/shop/(.*)$ /redirect.php [L]

* by the way, my c and p are numbers only

g1smd

1:12 pm on Dec 2, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I would try...

RewriteCond %{QUERY_STRING} &?(c=¦p=)
RewriteRule [b]^s[/b]hop/ /my-redirect.php [L]

or similar.

This rewrite needs to go before any other rewrites that could match this URL.

Buster13

1:32 pm on Dec 2, 2009 (gmt 0)

10+ Year Member



i got this error in browser when i try my new static url now,

The page isn't redirecting properly

i am sure, it did reach redirect.php when i try dynamic url because i correct some PHP errors. Can i be a endless loop within htaccess?

static -> rewrite to index.php?c=xx -> query string caught -> rewrite to static

---
my redirect.php works! how i know?
i entered /shop/index.php?c=2

the url in browser become /shop/mens/ (which is correct)
however the page cannot load after that freeze.

when i comment the 2 new added lines, /shop/mens/ works properly
that is why is suspect the these 2 lines CRASH with existing rewrite static -> dynamic rules

---
in my redirect.php, i use

$static_url = rewriteURL ($c, $p);

header("HTTP/1.1 301 Moved Permanently");
header("Location: $static_url"); /* Redirect browser */

---
i placed the 2 lines BEFORE the rest of the rule. i got a hunch that is should be at the bottom of htaccess because

dynamic url will not be caught in static->dynamic rules UNTIL when it goes all the way down of htaccess, it get caught in QUERY_STRING and get converted to static and redirect 301 via redirect.php.... it revisit htaccess with new static and get caught with static->dynamic rules.

[edited by: Buster13 at 1:49 pm (utc) on Dec. 2, 2009]

jdMorgan

1:49 pm on Dec 2, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



The loop is due to the actions of both .htaccess and PHP:
static-> .htaccess rewrite to index.php?c=xx -> query string caught in PHP -> PHP rewrite to static -> Repeat "forever"

Your PHP has to check to see that the dynamic URL is being requested directly by a client (browser or SE robot) before redirecting, and not as the result of the rewrite in .htaccess.

I'm not sure how you do that in PHP. In mod_rewrite, one method is to check the server variable "%{THE_REQUEST}" which contains the exact HTTP request line sent by the client, unaffected by any internal rewrites. So if this string contains a GET for a dynamic URL-path, you know you need to redirect it. If it contains a static URL-path, you know you must not redirect it. However, I don't think that this variable is available (at least not with that name) to PHP.

Another method is to check REDIRECT_STATUS to see if the request has already been internally rewritten (note the terminology inconsistency here). But again, I'm not sure how you do that within PHP.

Perhaps someone else following this thread knows the answers, or you could add a bit of temporary code to your PHP script to dump all the PHP variables and look for either the client request line or a variable indicating the redirect (i.e. rewrite) status.

Unfortunately, you don't have the option of putting the dynamic->static redirect in .htaccess and using %{THE_REQUEST}, because the dynamic URL doesn't contain all the information needed to build the static URL.

If PHP does not have access to a variable that can be used to determine whether the client directly requested a dynamic URL, then an alternative is to change the query parameter names used to call the script (so that the new dynamic filepath no longer resembles the old dynamic URL and you can therefore tell them apart) or to add a new query parameter in your rewriterule that indicates that you're calling the script as a result of an internal rewrite. So if this new parameter isn't present, then you will know that you need to redirect...

Jim

Buster13

2:26 pm on Dec 2, 2009 (gmt 0)

10+ Year Member



not sure what you mean above. too difficult for mi to grasp. i understand my problem in looping as such,

dynamic (index.php?c=xx)
¦
htaccess dynamic caught in QUERY STRING
hence redirect.php
¦
redirect.php re-create static url using $c and $p
301 and header Location...
¦
htaccess static url caught in static->dynamic rule set
¦
htaccess dynamic caught in QUERY STRING
hence redirect.php
¦
(repeats)

---
is there a condition i can add in htaccess such that,

IF http_request is FROM /redirect.php {do not get caught in QUERY STRING}

jdMorgan

2:44 pm on Dec 2, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



No, .htaccess has no concept of "FROM /redirect.php". The cure is as I described above.

Jim

Buster13

2:50 pm on Dec 2, 2009 (gmt 0)

10+ Year Member



found this topic similar to my case.

[snipplr.com...]

---
following above eg. i modified to

RewriteRule ^shop/men/?$ shop/index.php?c=2 [L]
RewriteCond %{ENV:REDIRECT_STATUS} ^$
RewriteCond %{QUERY_STRING} ^(.*&)?c=2(&.*)?$ [NC]
RewriteRule ^shop/index\.php$ /shop/men/? [NC,R=301,L]

now i entered /shop/index.php?c=2
it becomes /shop/mens/
with 301 captured in HTTP header

it works!
---
can u explain to me, what the 4 lines do?

[edited by: jdMorgan at 2:58 pm (utc) on Dec. 2, 2009]
[edit reason] Disabled smilies in code. [/edit]

Buster13

2:57 pm on Dec 2, 2009 (gmt 0)

10+ Year Member



-.-"

#RewriteRule ^shop/men/?$ shop/index.php?c=2 [L]
#RewriteCond %{ENV:REDIRECT_STATUS} ^$
#RewriteCond %{QUERY_STRING} ^(.*&)?c=2(&.*)?$ [NC]
#RewriteRule ^shop/index\.php$ /shop/men/? [NC,R=301,L]

RewriteCond %{ENV:REDIRECT_STATUS} ^$
RewriteCond %{QUERY_STRING} &?(c=¦p=)
RewriteRule ^shop/ /redirect.php [L]

---
it works like a charm BETTER than those 4# lines
because it rewrite to redirect.php, it became wildcard * accepting any values of c and p and properly convert to a static url.
---
i captured http header info.
when i entered /shop/index.php?c=2

http://www.example.net/shop/index.php?c=2
GET /shop/index.php?c=2 HTTP/1.1

HTTP/1.x 301 Moved Permanently

http://www.example.net/shop/mens/

HTTP/1.x 200 OK
---
my 2 last question will be

1) is my captured http info logical?

2)are there better ways to tweak those bottom 3 lines? my c and p are numbers only.

[edited by: jdMorgan at 3:17 pm (utc) on Dec. 2, 2009]
[edit reason] Disabled smilies in code. [/edit]

jdMorgan

3:11 pm on Dec 2, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



OK, here you are testing REDIRECT_STATUS as I described above. As long as redirect.php can 'pull' all the information it needs from the dynamic URL (and/or your database), then this is a good solution -- My comments above assumed incorrectly that your PHP redirect was *not* a separate script, as it obviously is here.

So this becomes a third viable option.

A fourth is to test THE_REQUEST, allowing a two-line mod_rewrite solution:


RewriteCond %{THE_REQUEST} ^[A-Z]+\ /shop/[^?]*\?([^&]*&)*(c=¦p=)[^\ ]*\ HTTP/
RewriteRule ^shop/ /redirect.php [L]

Here, THE_REQUEST contains the original client HTTP request, as seen in your raw server logs. For example:
GET /shop/mens/ HTTP/1.1
which won't be sent to redirect.php because it is static, or

GET /shop/index.php?c=2 HTTP/1.1
which will be sent to redirect.php, because it contains the "c=" parameter.

Further, if all dynamic URL requests also always contain "index.php", you can make the rule more selective (to prevent unexpected results in the future) and also more efficient:


RewriteCond %{THE_REQUEST} ^[A-Z]+\ /shop/index.php\?([^&]*&)*(c=¦p=)[^\ ]*\ HTTP/
RewriteRule ^shop/index\.php$ /redirect.php [L]

Note: be sure to replace the broken pipe "¦" character in the RewriteCond pattern with a solid pipe character before use; Posting on this forum modifies the pipe characters.

Jim

Buster13

3:43 pm on Dec 2, 2009 (gmt 0)

10+ Year Member



[RewriteCond %{THE_REQUEST} ^[A-Z]+\ /shop/index.php\?([^&]*&)*(c=¦p=)[^\ ]*\ HTTP/
RewriteRule ^shop/index\.php$ /redirect.php [L] ]

what can i say? it just works perfectly. i not sure what your 1st line means, ready too tough for mi to understand... but it just works. maybe u like to break up and explain the Cond for me ;p
---

last but not least, i want to check with u, redirect 301 twice, it is normal? i ask this because my previous post only consists of one time 301, but of cause, it does not make use of redirect.php
will redirect 301 twice penalty my website?

http://www.example.net/shop/index.php?c=2

HTTP/1.x 301 Moved Permanently

http://www.example.net/redirect.php?c=23&p=1022

HTTP/1.x 301 Moved Permanently

http://www.example.net/shop/womens/footwear/18/1022/

HTTP/1.x 200 OK

---

[edited by: jdMorgan at 3:57 pm (utc) on Dec. 2, 2009]
[edit reason] Disabled smilies in code. [/edit]

g1smd

4:03 pm on Dec 2, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



There should only be one redirect.

At the moment it looks like your normal script takes a c= request (with p= missing) and is redirecting it to a new parameter-driven URL and then your redirect script takes over and issues the correct redirect.

You need to set something up in redirect.php for URL requests where only c= is present.

jdMorgan

4:05 pm on Dec 2, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



No that's not normal, and needs to be corrected, as it will hurt your search rankings. Or rather, it will defeat your efforts to correct and help your search rankings.

A direct client request for http://www.example.com/shop/index.php?c=2 should be redirected "all in one step" to http://www.example.com/shop/womens/footwear/18/1022/

So look at your redirect.php script to see why this double-redirect might be happening. Is it because the "p=" parameter is missing from the client-requested dynamic URL? Maybe it's not being done by redirect.php, but by some other code; Somehow, a redirect is being triggered to add "p=1022" and to change the category to 23 when the requested URL doesn't have a "p=" value, and possibly only when c=2. So you need to find that code to identify the problem.

To understand the RewriteCond I posted, take a look at the regular-expressions tutorial cited in our Apache Forum Charter [webmasterworld.com]. Regular expressions are useful in almost all modern programming and scripting languages, and a working knowledge of them is quite valuable.

Jim

Buster13

4:36 pm on Dec 2, 2009 (gmt 0)

10+ Year Member



[Maybe it's not being done by redirect.php, but by some other code;]

yes i had this in redirect.php

header("HTTP/1.1 301 Moved Permanently");
header("Location: $static_url"); /* Redirect browser */

i thought this is norm standard for a redirect.php
should i remove 301 in redirect.php ?
---
i tried removing 301 from redirect..
i end up with HTTP/1.x 302 Moved Temporarily only (i bet from header: Location)
---
i believe 301 is needed in my situation and not 302 Temporarily.
what should i do?

Buster13

4:55 pm on Dec 2, 2009 (gmt 0)

10+ Year Member



ignore my above post. i recheck and saw only one 301 redirect now. maybe i did not refresh my browser just now.

i wan to thanks g1smd for giving me the idea of having redirect.php and not touch the codes in my website at all.

i wan to thanks Jim for giving me Cond codes for my htaccess enabling mi to make use of redirect.php

both advices from you guys, i manage to
dynamic -> static AND static -> dynamic
---
i wan to summarize to help people
---
:: codes in htaccess ::

RewriteCond %{THE_REQUEST} ^[A-Z]+\ /shop/index.php\?([^&]*&)*(c=¦p=)[^\ ]*\ HTTP/
RewriteRule ^shop/index\.php$ /redirect.php [L]

:: codes in redirect.php ::
<?php
if ( (isset($_GET['c']) && !is_numeric($_GET['c'])) ¦¦ (isset($_GET['p']) && !is_numeric($_GET['p'])) ) {
header("Location: http://www.example.net/shop/error.php");
exit;
}

$catId = (isset($_GET['c']) && $_GET['c'] != '1') ? $_GET['c'] : 0;
$pdId = (isset($_GET['p']) && $_GET['p'] != '') ? $_GET['p'] : 0;

$static_url = xrewriteURL($catId, $pdId);

header("HTTP/1.1 301 Moved Permanently");
header("Location: $static_url"); /* Redirect browser */

/* Make sure that code below does not get executed when we redirect. */
exit;
?>

Cheers

[edited by: jdMorgan at 6:28 pm (utc) on Dec. 2, 2009]
[edit reason] Disabled smilies in code [/edit]

Buster13

5:12 pm on Dec 2, 2009 (gmt 0)

10+ Year Member



T.T" i laugh too soon, there seem to be situation whereby 2 301 will occurs. eg i click from Google search engine result,
http://example.net/shop/index.php?c=0&p=1018

:: Live HTTP header Capture ::

GET /shop/index.php?c=0&p=1018 HTTP/1.1
Host: www.google.com.sg
----------------------------------------------------------
http://example.net/shop/index.php?c=0&p=1018

GET /shop/index.php?c=0&p=1018 HTTP/1.1
Host: example.net
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1.4) Gecko/20091016 Firefox/3.5.4 (.NET CLR 3.5.30729)

HTTP/1.x 301 Moved Permanently
Date: Wed, 02 Dec 2009 17:05:31 GMT
Server: Apache/2.0.63 (Unix) mod_ssl/2.0.63 OpenSSL/0.9.8e-fips-rhel5 mod_bwlimited/1.4
Location: http://www.example.net/redirect.php?c=0&p=1018
----------------------------------------------------------
http://www.example.net/redirect.php?c=0&p=1018

GET /redirect.php?c=0&p=1018 HTTP/1.1
Host: www.example.net
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1.4) Gecko/20091016 Firefox/3.5.4 (.NET CLR 3.5.30729)

HTTP/1.x 301 Moved Permanently
Date: Wed, 02 Dec 2009 17:05:31 GMT
Location: /shop/womens/footwear/18/1018/
----------------------------------------------------------
http://www.example.net/shop/womens/footwear/18/1018/

GET /shop/womens/footwear/18/1018/ HTTP/1.1
Host: www.example.net
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1.4) Gecko/20091016 Firefox/3.5.4 (.NET CLR 3.5.30729)

HTTP/1.x 200 OK
Date: Wed, 02 Dec 2009 17:05:31 GMT
Server: Apache/2.0.63 (Unix) mod_ssl/2.0.63 OpenSSL/0.9.8e-fips-rhel5 mod_bwlimited/1.4

=======================
my solution is
a) remove 301 from redirect.php (will result in 302 default) OR
b) RewriteRule ^shop/index\.php$ /redirect.php [L] -> [R=302,L]

just my 2 cents.

[edited by: jdMorgan at 1:30 pm (utc) on Dec. 3, 2009]
[edit reason] Fixed side-scroll [/edit]

g1smd

6:38 pm on Dec 2, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



No. Any redirect must be a 301 redirect. Don't use a 302.

In the code several posts back, you have a 302 redirect to an error page. That error page will return a "200 OK" meaning that it isn't an error.

You need to serve a 404 header and an error message instead of that 302 redirect.

.

For this last problem (immediately above), when you request a non-www URL with parameters there is a redirect to www (in your .htaccess) happening before the rewrite to the special script is occurring. This means that for a non-www request with parameters the .htaccess rule fixes the www with a redirect and the script then separately fixes the parameters problem with another redirect.

You could move the line that does the rewrite to the special script to be placed before the non-www to www redirect in your .htaccess file -OR- you could change the non-www to www redirect in your .htaccess file (by adding a RewriteCond) so that it does not redirect from non-www to www for URL requests that contain c= and/or p= parameters (leaving the script to fix the www problem for those requests at the same time it fixes the parameters).

I prefer the latter method. Whatever you do, this would allow the rewrite to the special script to pick up any www or non-www URL request containing parameters and for the script itself to work its magic in one hit, producing both the correct URL path and the correct domain within that single script-generated redirect.

.

In my original code (way back at the start of this thread, post #:4035351) where I suggested a rewrite to the 'special' script, I made an error in forgetting to tell you to check THE_REQUEST and not just the QUERY_STRING. It's my fault you went off on a tangent for a while; rescued by jd who spotted that. I would have quickly spotted the problem had I been testing the code on my own server; but some of the code I post here is untested.

.

I have to also ask whether a URL request with c= and/or p= parameters will always start with /shop/index.php here? Could it ever just be /shop or /shop/ instead? I would want to cater for all possibilities. You might not need to, but it might be safer to do so, just in case you ever set up any 'index to slash' redirects somewhere in your system.

[edited by: g1smd at 6:58 pm (utc) on Dec. 2, 2009]

jdMorgan

6:41 pm on Dec 2, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Because of the nature of this set-up, the rewriterule used to rewrite to redirect.php cannot fix the www/non-www problem, because it is *rewriting* and not redirecting to redirect.php. This is how it must be for the somewhat-special situation described in this thread.

However, that means that you cannot use mod_rewrite to do any domain canonicalization redirects of dynamic /shop/index.php?c/p= URLs, because if the requested hostname is wrong (non-www), then that error must also be fixed by redirect.php.

The internal rewrite of dynamic /shop/index.php?c/p= URLs to redirect.php can be followed by a generic domain canonicalization redirect rule in this case (as long as that internal rewrite rule has an [L] flag on it), because the rule used to rewrite to redirect.php functions as a redirect in this specific case (normally, we never want to follow a rewrite rule with a redirect rule).

Sorry if that's complicated and confusing, but the problem addressed in this thread isn't a simple one...

Jim

Buster13

3:03 am on Dec 3, 2009 (gmt 0)

10+ Year Member



Hi Guys, I am back for discussion! ;D

g1smd:
[You need to serve a 404 header and an error message instead of that 302 redirect.]

Htaccess
---
ErrorDocument 404 /shop/error.php
ErrorDocument 401 /shop/401.php

PHP index.php: Commented Old Codes AND add line #1
---
header($_SERVER["SERVER_PROTOCOL"]." 404 Not Found");
//header("HTTP/1.1 404 Not Found");
//header("Location: http://www.example.net/shop/error.php");
exit;

Results above
---
It served 404 header instead of previous 302 redirect followed by 200 OK. However, i got a slight little problem,

http://www.example.net/shop/mens/9999abc/
Served 404 Not Found and redirected to /shop/error.php (GOOD)

http://www.example.net/shop/mens/999999999999/
Served 404 Not Found BUT did NOT redirect to /shop/error.php (STUCK?)
Webpage juut stay BLANK

I just don't understand why. It served 404 but did not execute the redirection ErrorDocu code in htaccess?

=========================================================
g1smd/Jim:
You guys are right, the non-www is causing another 301 Redirect when i captured HTTP live headers. Below is my solution which i edited redirect.php and htaccess

redirect.php (commented and added www.example.net)
---
header("HTTP/1.1 301 Moved Permanently");
header("Location: http://www.example.net$static_url"); /* Redirect browser */
//header("Location: $static_url"); /* Redirect browser */

htaccess (added #1 !redirect.php)
---
#Redirect domain.com to www.domain.com:
RewriteCond %{REQUEST_URI} !redirect.php
RewriteCond %{HTTP_HOST} ^example.net [NC]
RewriteRule ^(.*)$ http://www.example.net/$1 [L,R=301]

Results
---
:: Scenerio A (non www-) ::
http://example.net/shop/index.php?c=0&p=1018

GET /shop/index.php?c=0&p=1018 HTTP/1.1
Host: example.net

HTTP/1.x 301 Moved Permanently
Location: http://www.example.net/shop/womens/footwear/18/1018/
Connection: close

http://www.example.net/shop/womens/footwear/18/1018/

GET /shop/womens/footwear/18/1018/ HTTP/1.1
Host: www.example.net

HTTP/1.x 200 OK
Connection: close

:: Scenerio B (with www-) ::
http://www.example.net/shop/index.php?c=0&p=1018

GET /shop/index.php?c=0&p=1018 HTTP/1.1
Host: www.example.net

HTTP/1.x 301 Moved Permanently
Location: http://www.example.net/shop/womens/footwear/18/1018/
Connection: close

http://www.example.net/shop/womens/footwear/18/1018/

GET /shop/womens/footwear/18/1018/ HTTP/1.1
Host: www.example.net

HTTP/1.x 200 OK
Connection: close

When I compare both captures side-by-side, the only different is Host: (www).example.net
I presume I have solved the 301 Redirect twice error by editing BOTH htaccess and redirect.php

Cheers and Thanks alot

g1smd

10:00 am on Dec 3, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



The redirect.php special script must explicitly state "www" as a part of the redirected URL.

Both non-www and www requests with the c= and p= parameters must be rewritten (not redirected) to the special script by the .htaccess.

There must not be a non-www to www redirect within .htaccess that could fix the www problem for the c= and p= URLs before the request is passed to the script.

It looks like that is what you now have. It is important that you fully understand the 'logic path' through your .htaccess rules and scripts for all types of URL requests and ensure there are no 'redirection chains'. I often see sites that have a chain of several redirects. I once saw one that attempted to string 11 redirects together. In those cases, they have virtually no chance of bots being able to properly access and index their site.

g1smd

10:06 am on Dec 3, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



:: codes in redirect.php ::
<?php
if ( (isset($_GET['c']) && !is_numeric($_GET['c'])) ¦¦ (isset($_GET['p']) && !is_numeric($_GET['p'])) ) {
header("Location: http://www.example.net/shop/error.php");
exit;
}

As for your ErrorDocument problem, hopefully I can explain that.

The ErrorDocument is a file on your server. If you access that Document using its native URL then as all documents will, it will return the status as "200 OK".

If you redirect to an ErrorDocument URL, the browser is being asked asked to request a different URL. The user will first see a 301 or a 302 redirect status, followed by the 200 status for the document itself.

So, your code above sends a 302 redirect and the browser makes a new request for http://www.example.net/shop/error.php and that document then serves a "200 OK" status (or a 404 status code if the very first line of the error document is programmed to send that instead). The main problem is the 302 status served for the original URL request.

What should happen, is that if a URL does not exist, the server should immediately send a "404 Not Found" header and then the contents of the ErrorDocument file, and the URL bar of the browser should not change. The 404 status code should always be served for the currently requested URL.

So, you could send a HEADER value of 404 and then locally INCLUDE the contents of the document, referencing it as a local file within the server filesystem, not calling it as an external URL.

If you call the ErrorDocument as a URL, the user will 'see' that URL and will see a 30x redirect status code for the originally requested URL. That means the user does not see a 404 status code for the originally requested URL - and that is a huge problem.

I hope by now you can see that everything must be evaluated using the URLs users see in the browser URL bar, and the status code that is returned for that URL request. That is, everything to do with mod_rewrite is handled by looking at what the browser asks for and what the server returns for that request.

Buster13

1:21 pm on Dec 3, 2009 (gmt 0)

10+ Year Member



i had these codes in my /shop/index.php that does validating and sanitizing for $c and $p

if ( (isset($_GET['c']) && !is_numeric($_GET['c'])) ¦¦ (isset($_GET['p']) && !is_numeric($_GET['p'])) ) {
header($_SERVER["SERVER_PROTOCOL"]." 404 Not Found");
exit;
}

For above codes, it was repeated 4 times in index.php

eg. if (cond 1 fails) {execute header}

require_once 'library/config.php';
require_once 'library/category-functions.php';
require_once 'library/product-functions.php';
require_once 'library/cart-functions.php';

if (cond 2 fails) {execute header}
if (cond 3 fails) {execute header}
if (cond 4 fails) {execute header}

----
the funny thing is only cond 1 able display content of error.php (yes wrong url remain unchange in browser)

cond 2 3 4 all give 404 NOT Found (then my browser blank white) (http header captured END after give 404 NOT Found

What is funny here is, why condition 1 can display contents of error.php and not the rest? the only diff i can think of is i call my include files(required for cond 2 3 4) aft cond 1. i test cond 2 3 4 by typing URL that get trap, it really works and give 404 NOT found but it just would NOT display contents of error.php

jdMorgan

1:47 pm on Dec 3, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I suspect you need to send the 404 header, and then include the error page.

I'm not experienced at PHP coding, but the idea is like this:

redirect.php:


if ( (isset($_GET['c']) && !is_numeric($_GET['c'])) ¦¦ (isset($_GET['p']) && !is_numeric($_GET['p'])) ) {
header($_SERVER["SERVER_PROTOCOL"]." 404 Not Found");
require_once '/shop/error.php';
exit;
}

Otherwise, you're sending a 404 response with a blank content-body.

Jim

This 36 message thread spans 2 pages: 36