Forum Moderators: phranque

Message Too Old, No Replies

ZenCart Bad Redirection Advice

         

g1smd

8:15 pm on Mar 1, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Found in the 'extras' folder of a ZenCart install that I had the 'pleasure' of fixing up last week...

Filename: htaccess_for_page_not_found_redirects.htaccess


# replace www.EXAMPLE.COM with your website address.
# be sure to add any extra paths required to locate the index.php
# then relocate this file to the root of your store's folders, and rename it as .htaccess

ErrorDocument 404 http://www.EXAMPLE.COM/index.php?main_page=page_not_found


The above code forces a 302 redirect; the browser makes a new request for that new URL and if a '200 OK' HTTP status code is then returned, the 'non-existent' pages will continue to be indexed.

Even if that second URL request does result in a '404 Not Found' HTTP status code being returned, the prior 302 redirect has already done the damage in not directly returning a 404 status at the originally requested URL.


That code should, of course, be:

ErrorDocument 404 [b]/[/b]index.php?main_page=page_not_found


at the very least, but in case of failure of the PHP scripting itself, the error document would be much better off as a static HTML file.

I am now wondering how many tens of thousands of sites have this incorrect code installed?

g1smd

12:12 am on Mar 2, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Site has a large number of
http://www.example.com/index.php?main_page=popup_image&pID=48000 
type URLs indexed.

These are popup images linked from product pages. When visited from SERPs, they have no way to un-popup, and no links back to the product page, the category page or the root index page of the site. There's no way into the rest of the site from these things.

One swift fix was to simply add:

Disallow: /?main_page=popup_image
Disallow: /index.php?main_page=popup_image


to the robots.txt file.

g1smd

7:59 am on Mar 2, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I can visibly see that pages are slow to load and/or render beyond loading the top header and left side navigation.

Google WebmasterTools says this particular site is slower than 80% of other sites on the web. It says some pages are taking up to 12 seconds to load. While I don't think that number is right, the site certainly needs speeding up.

This is looking like a challenge!

g1smd

12:38 am on Mar 4, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



ZenCart has the most ugly URLs imaginable. I can't begin to understand what the designers might have been thinking.

URL writing seems like a 'must do' but I am not sure whether to put any trust into any of the URL rewriting add-ons that I have seen so far, especially when they come with inconsistent and abysmal code like this:

www
# trailing slash
# If URL-path does not contain a period or end with a slash
#RewriteCond %{REQUEST_URI} !(\.|/$)
# add a trailing slash
#RewriteRule (.*) http://www.example.com/$1/ [R=301,L]
#
# Redirect non-www domain requests to www domain
RewriteCond %{HTTP_HOST} ^example\.
RewriteRule (.*) http://www.example.com/$1 [R=301,L]

non-www
# trailing slash
# If URL-path does not contain a period or end with a slash
#RewriteCond %{REQUEST_URI} !(\.|/$)
# add a trailing slash
# RewriteRule .* http://example.com%{REQUEST_URI}/ [R=301,L]
#
# Canonicalize the domain
# Redirect non-www domain requests to www domain
RewriteCond %{HTTP_HOST} ^www\.example\.
RewriteRule (.*) http://example.com%{REQUEST_URI} [R=301,L]

g1smd

10:19 pm on Mar 4, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



The above code, corrected:

# www
# Domain canonicalisation
# Redirect all non-www domain requests, and www with port number, to www domain
RewriteCond %{HTTP_HOST} .
RewriteCond %{HTTP_HOST} !^www\.example\.com$
RewriteRule (.*) http://www.example.com/$1 [R=301,L]


# non-www
# Domain canonicalisation
# Redirect all www domain requests, and non-www with port number, to non-www domain
RewriteCond %{HTTP_HOST} .
RewriteCond %{HTTP_HOST} !^example\.com$
RewriteRule (.*) http://example.com%{REQUEST_URI} [R=301,L]


In fixing domain canonicalisation issues, there was at least one dilemma in deciding whether to redirect all pages of the site to the www version or to the non-www version.

Most of the pages of the site were already listed in Google as non-www URLs. All of the internal linking also pointed at non-www URLs (set in configure.php file).

The root index page was listed in Google as www.example.com/ as well as the duplicate example.com/index.php?main_page=index listing. The root index page was the only page listed as a www URL. All incoming external linking also pointed to www.example.com/ too.

Google was spidering both www and non-www URLs, but spidering the non-www URLs a lot more than www URLs.

The decision was made to implement a redirect from non-www to www (because the root page was already correctly listed, and external links already pointed at that version).

Once implemented, the site was almost completely reindexed as using www URLs within only a matter of days. The non-www URLs which now redirect are still indexed, but are expected to drop out over the coming months.

[edited by: jdMorgan at 2:50 am (utc) on Mar 7, 2010]
[edit reason] Edited at member's request. [/edit]

g1smd

10:57 pm on Mar 4, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Fixing the duplicate URLs for the root index page, as well as adding the general rule for domain canonicalisation gives:

# Index Redirect for /index.php with no parameters
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /index\.php\ HTTP/
RewriteRule ^index\.php$ http://www.example.com/? [R=301,L]


# Index Redirect for /index.php?main_page=index and /?main_page=index URL
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /(index\.php)?\?main_page=index\ HTTP/
RewriteRule ^(index\.php)?$ http://www.example.com/? [R=301,L]


# Canonical Redirect non-www to www [EXCLUDE /admin requests]
RewriteCond %{HTTP_HOST} .
RewriteCond %{HTTP_HOST} !^www\.example\.com$
RewriteCond %{REQUEST_URI} !^/admin [NC]
RewriteRule (.*) http://www.example.com/$1 [R=301,L]

jdMorgan

6:02 pm on Mar 5, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Suggested (shorter) equivalent for last rule above:

# Canonical Redirect non-www to www [EXCLUDE /admin requests]
RewriteCond %{HTTP_HOST} !^(www\.example\.com)?$
RewriteCond %{REQUEST_URI} !^/admin [NC]
RewriteRule (.*) http://www.example.com/$1 [R=301,L]

Jim

g1smd

12:48 am on Mar 6, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



That works too. Thanks Jim.

Shorter, yes, but is it any quicker? I have no idea on that score.

jdMorgan

1:30 am on Mar 6, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



In .htaccess, yes. In a server config file, no. That's because .htaccess is re-compiled on the fly for every HTTP request, whereas code in hhtpd.conf (for example) is only compiled once at server start-up. Andreas_Firedrich posted some benchmark test results several years ago that made these differences obvious.

So a good rule of thumb is, in .htaccess, use regex 'OR', and in server config files, use multiple-RewriteCond [OR].

Jim

g1smd

9:44 am on Mar 6, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



That makes sense. Cheers!

kuzman05

9:02 pm on Mar 19, 2010 (gmt 0)

10+ Year Member



I've got issue with zencart redirects. In short: zencart resides on www.site.com/shop and I want to move it to just www.site.com. Currently anybody hitting www.site.com will be forwarded to /shop folder where the cart is. Because I have lot of inbound links to /shop pages I want all of those to be 301 redirected to their counter parts on www.site.com where i copied the entire cart. I've played with many combinations in .htaccess in both the webroot and /shop folder but nothing got me what I needed. Any insights/advice? Here is the current .htaccess in /shop folder
#### BOF SSU
Options +FollowSymLinks -MultiViews
RewriteEngine On

# Make sure to change "test_site" to the subfolder you install ZC. If you use root folder, change to: RewriteBase /
RewriteBase /

# Deny access from .htaccess
RewriteRule ^\.htaccess$ - [F]

RewriteCond %{SCRIPT_FILENAME} -f [OR]
RewriteCond %{SCRIPT_FILENAME} -d
RewriteRule .* - [L]

RewriteRule ^(.*) index.php?/$1 [E=VAR1:$1,QSA,L]
#### EOF SSU

g1smd

10:44 pm on Mar 19, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



There's a boat load of issues with ZenCart. There's multiple URLs for the root index page for starters.

This redirects the folder requests to root as well as fixes requests for
index.php
and
index.php?main_page=index
and
?main_page=index
and
index.php?main_page=index&cPath=
and
?main_page=index&cPath=
too.

You might have already noticed the mess these duplicate URLs make in your analytics.

# Index Redirect for /(zencart/)index.php with no parameters
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /(zencart/)?index\.php\ HTTP/
RewriteRule ^(zencart/)?index\.php$ http://www.example.com/? [R=301,L]
#
# Index Redirect for /(zencart/)(index.php)?main_page=index(&cPath=) URL
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /(zencart/)?(index\.php)?\?main_page=index(&cPath=)?\ HTTP/
RewriteRule ^(zencart/)?(index\.php)?$ http://www.example.com/? [R=301,L]
#
# Redirect all /zencart/ folder URL requests
RewriteRule ^zencart/(.*)$ http://www.example.com/$1 [R=301,L]
#
# Canonical Redirect non-www to www [EXCLUDE /admin requests]
RewriteCond %{HTTP_HOST} !^(www\.example\.com)?$
RewriteCond %{REQUEST_URI} !^/admin [NC]
RewriteRule (.*) http://www.example.com/$1 [R=301,L]


Make sure you don't simply move the files from the ZenCart folder to the root folder. If you do, you'll find some internal links to images and files might include a leading rogue
../
construct that is not required. You might need to reinstall the code, or at least edit some configuration data somewhere.

[edited by: g1smd at 10:57 pm (utc) on Mar 19, 2010]

kuzman05

10:56 pm on Mar 19, 2010 (gmt 0)

10+ Year Member



Thanks for prompt replay. I'll try it tonight when traffic days down. BTW I did noticed multiple formats that index page was being called by.

g1smd

1:20 am on Mar 20, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Once it is working, I would make one more change.

For efficient running of the site, I would make two copies of the code above, and then put one in the .htaccess file in the root folder and another in the /zencart/ folder.

I would edit each copy to localise the URL requests that it responds to.

In the file in the root, delete the 'zencart/' part out of all the conditions and rules (and remove the associated ( )? punctuation too), and delete the entire third rule.

In the file in the /zencart/ folder delete the zencart/ part from every rule, but not from any of the conditions. Remove the ( )? punctuation associated with all of the zencart/ parts of all patterns.

kuzman05

8:03 am on Mar 20, 2010 (gmt 0)

10+ Year Member



Well it worked. I had to tweak it it a bit but here is the combo that did it for /shop/ folder:
# Index Redirect for /(shop/)index.php with no parameters
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /(shop/)?index\.php\ HTTP/
RewriteRule ^index\.php$ [site.com...] [R=301,L]
#
# Index Redirect for /(shop/)(index.php)?main_page=index(&cPath=) URL
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /(shop/)?(index\.php)?\?main_page=index(&cPath=)?\ HTTP/
RewriteRule ^(index\.php)?$ [site.com...] [R=301,L]
#
# Redirect all /shop/ folder URL requests
RewriteRule ^(.*)$ [site.com...] [R=301,L]
#
# Canonical Redirect non-www to www [EXCLUDE /admin requests]
RewriteCond %{HTTP_HOST} !^(www\.site\.com)?$
RewriteCond %{REQUEST_URI} !^/admin [NC]
RewriteRule (.*) [site.com...] [R=301,L]


and this what i have for webroot folder:
RewriteCond %{SCRIPT_FILENAME} -f [OR]
RewriteCond %{SCRIPT_FILENAME} -d
RewriteRule .* - [L]
RewriteRule ^(.*) index.php?/$1 [E=VAR1:$1,QSA,L]



*** Just one more question - in the webroot .htaccess file under the above mentioned code there are some 20+ rules for separate iDevAffiliate software that resides in webroot/affiliate/ folder. Will those rules be affected by what my zencart stuff and if so what is my recourse?
Thank you for your help.

g1smd

8:10 am on Mar 20, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



You did the first bit, but not the second. "In the file in the /zencart/ folder delete the zencart/ part from every rule, but not from any of the conditions. Remove the ( )? punctuation associated with all of the zencart/ parts of all patterns."

In your root folder you do also need the first two and the last rule, again modified (from the original in post #:4101329 above) as shown. "In the file in the root, delete the 'zencart/' part out of all the conditions and rules (and remove the associated ( )? punctuation too), and delete the entire third rule."

g1smd

8:10 am on Mar 20, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Use example.com to shop forum auto-linking.

kuzman05

10:28 pm on Mar 22, 2010 (gmt 0)

10+ Year Member



After implementing the second part of ()? deleting I would get the infinite redirects.

g1smd

2:23 pm on Mar 24, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



That shouldn't happen.

You're making the file in the root work for all URL requests except for those in the /zencart/ folder, and you're making the file in the /zencart/ folder deal with URL requests for URLs that map to that folder.

g1smd

3:01 pm on Mar 24, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



WebmasterTools exposed yet more design errors with the ZenCart site last week.

The individual products have horrible URLs like
www.example.com/index.php?main_page=product_info&cPath=567&products_id=45678
which already exposes the system to multiple potential Duplicate Content problems.

Additionally, each product page has buttons marked 'prev', 'listing' and 'next' which when clicked return the user either to the product category index page or go on to view the 'previous' or 'next' products in that list.

When a product is shown in a 'sale' or 'special offers' category there will be multiple linked-to URLs that can return that product. This is because it is the products_id variable that pulls the content for display, the cPath category ID can be anything and is a major source of Duplicate Content issues. In this case the product will appear in both the regular category and the 'sale' category, each with a different cPath category ID parameter value (but the same products_id parameter value).

Once the sale is over, and the item is removed from the 'special offers' section, but continues to remain on sale in another category, the problems are far from over.

The previously published URL for the 'sale' item, continues to work and return a 200 OK status and the correct content for the product. It does this because the 'cPath' parameter value plays no part in deciding whether to show a product or not. The 'special offers' version of the page therefore continues to show in search results and rank for those terms, but is now an orphan page with no incoming links to it within the site. This is a dangerous situation in its own right.

Click the link in the SERP to visit the page and this is what you will see. The page lists the product as 'Item 1 of x' where x is the actual number of products in the 'parent category', even thought this item is no longer listed in that category. The 'listing' link correctly links to the 'special offers' category, but the 'prev' and 'next' links are now truly broken.

They both point to a URL like
www.example.com/index.php?main_page=product_info&cPath=567&products_id=
with a blank products_id value. Once the 'prev' or 'next' links are clicked to visit the broken URL, the user sees a 'product cannot be found' error message. Using Analytics it could be seen that a fair number of people exit the site at just such a URL.

It would be a lot of work to edit the ZenCart script to instead link to the correct 'previous' and 'next' items in the category list. A simple and temporary fix has been to redirect malformed 'next' and 'prev' links to instead take the visitor to that 'parent' category index page. Ideally the visitor should be taken to the current product page in the correct category, but that also requires a lot of script editing.

The following code accepts a request like
/index.php?main_page=[b]product_info[/b]&cPath=[b]567[/b]&products_id=[i]<blank>[/i]
with blank products_id value or
/index.php?main_page=[b]product_info[/b]&cPath=[b]567[/b]
with products_id parameter missing, and redirects the user to the
www.example.com/index.php?main_page=[b]index[/b]&cPath=[b]567[/b]
category index instead.

# Fixes broken 'prev' and 'next' links from removed 'special offers' and other such product pages
# Index Redirect for /(index.php)?main_page=product_info&cPath=<anything>(&products_id=<blank>) URL
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /(index\.php)?\?main_page=[b]product_info[/b]&cPath=[b]([0-9_]+)[/b](&products_id=)?\ HTTP/
RewriteRule ^(index\.php)?$ http://www.example.com/index.php?main_page=[b]index[/b]&cPath=[b]%2[/b] [R=301,L]


While far from ideal as a 'fix', customers no longer see a 'product cannot be found' error message when they click from page 1 to go to page 2. They now see the rest of the current 'special offer' items. It might confuse them that this product does not appear to be listed there, but fixing that requires major changes to the way ZenCart works internally.

ZenCart doesn't do all the right checks on the URL request to ensure that content should not be returned for non-valid URLs. This is a serious flaw also found in most other forum, cart, blog, and CMS products.

BobbyStar

5:30 am on Mar 26, 2010 (gmt 0)

10+ Year Member



Hi,
I have these same issues. I'm using the free ceon uri mapping addon with zen carts latest version and have added the code below to my htaccess file. Would this conflict with the code in post #:4091770 and #:4103788 if I use them all together?


# Don't rewrite any URIs ending with a file extension (ending with .[#*$!x])
RewriteCond %{REQUEST_URI} !\.[a-zA-Z]{2,4}$
# Don't rewrite admin directory
RewriteCond %{REQUEST_URI} !^/admin name.* [NC]
# Don't rewrite editors directory
RewriteCond %{REQUEST_URI} !^/editors.* [NC]
# Don't rewrite cPanel directories
RewriteCond %{REQUEST_URI} !/cpanel.* [NC]
RewriteCond %{REQUEST_URI} !/frontend.* [NC]
# Handle all other URIs using Zen Cart (index.php)
RewriteRule (.*) index.php?%{QUERY_STRING} [L]

Just started using the addon about a week ago. Makes categorie and product page names look nice, but I still have tons of duplicates indexed.

G1smd thanks for starting this topic and posting your work. Also thanks for having me here.

g1smd

8:32 am on Mar 26, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



The code in #:4091770 should work for any ZenCart site, but do make sure the redirects are listed before any rewrites. Also make sure that if you have any Redirects using Redirect or RedirectMatch that you convert them to use RewriteRule instead.

The code in #:4103788 would need to be modified to produce "friendly" URLs. The best way would be for it rewrite to a script that looked up the category data from the database and then sent the HEADER out. That's because there is no way for .htaccess to 'know' the category and product names.

Another solution for the latter problem would be to edit the ZenCart scripts and directly fix the problems there. ZenCart doesn't sanity check the URLs it creates before adding them to the HREF on the page. There's scope for a 'canonicalisation' module to be added there.

jdMorgan

12:33 pm on Mar 26, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



BobbyStar,

That code could do with some improvements -- There are several inefficiencies in it:

# Don't rewrite requests for any URIs ending with a file extension (ending with .xyz)
RewriteCond $1 !\.[a-z]{2,4}$ [NC]
# Don't rewrite requests for admin,, editors, cpanel, or frontend directories
RewriteCond $1 !^(admin\ name|editors|cpanel|frontend) [NC]
# Handle all other URIs using Zen Cart (index.php)
RewriteRule ^(.*)$ index.php [L]

Now much shorter and faster, but no changes *at all* to function...

To avoid duplicate-content problems, I would recommend removing the [NC] from the directories exclusion and adding rules to redirect any mis-cased requests for those directories to the correctly-cased URL. However, since it's not likely that you'll want those directories to rank in search, in this specific case it is not a critical issue.

Jim

BobbyStar

3:33 am on Apr 1, 2010 (gmt 0)

10+ Year Member



g1smd,
I must be overlooking something. I can not find the file (htaccess_for_page_not_found_redirects.htaccess) that you spoke of in your first post. The directories I have in version 138a are extra_configures, extra_datafiles and extra_cart_actions. None have that file in them. Are you working on the same version? Thanks.

BobbyStar

4:04 am on Apr 1, 2010 (gmt 0)

10+ Year Member



jdMorgan,

Before I attempt to wright any rules to redirect requests... I need to do a lot of reading. The faster code worked fine. Thanks.