Forum Moderators: phranque

Message Too Old, No Replies

old url rewrite plugin urls, reg expression - oscommerce

url rewrites, regular expressions, oscommerce

         

soundzgood2

10:58 am on Apr 21, 2011 (gmt 0)

10+ Year Member



Hello WebmasterWorld contributors,
I'm new to the subject of url rewriting and regular expressions and hope you can assist.

The site has been using a plugin that created urls like this:
www.mysite.com/Costume_Accessories/c1_2/p123/LONG_NYLON_GLOVES_-_WHITE_/product_info.html (OLD URL)

The new plugin creates urls that look like this:
www.mysite.com/long-nylon-gloves-white (NEW URL)

and I understand that for seo reasons it's necessary to rewrite the old with the new.

-----
In htaccess I have found that the following will work to rewrite the old url:

RewriteCond %{REQUEST_URI} /Costume_Accessories-Accessories/c6_8/p365/LONG_NYLON_GLOVES_-_WHITE_/product_info.html
RewriteRule ^(.*) htp://www.mysite.com/long-nylon-gloves-white [R=301,L]

There are of course many old urls so adding hundreds of these isn't practical or useful to the loading of the site.

I'm aware of regular expressions but not how to write one that would create the new url format ... can you help?

Thanks for your time,
Simon

g1smd

6:21 pm on Apr 21, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



There is nothing in the path and/or filename parts of the old URL that can be re-used in the new URL. This effectively means that mod_rewrite cannot be used to formulate the new URL based on the path part of the old.

You should instead internally rewrite all URL requests of the "old" type to a special PHP script which performs a database lookup to find out what the "new" URL should be. This special script then issues the HTTP headers for the 301 redirect to the new URL.

After all these years, URL handling in both osCommerce and Zencart is still abysmal.

enigma1

6:01 pm on Apr 24, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



After all these years, URL handling in both osCommerce and Zencart is still abysmal.

There are contributions that do exactly what you say they generate 100% static urls and work at the application level. The mods you need to do in the .htaccess are generic and minimal (couple of lines). See for example SEO-G for oscommerce includes a redirection table from where you can input the old urls and the targets. You don't need to bloat the .htaccess

jdMorgan

5:23 pm on Apr 25, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



That's true -- there are contributions to do this. However, it's likely that g1smd's comment was addressed to the poorly-designed .htaccess code that comes with these carts and with their contributions. That code has been the subject of several "fix-up projects" here at WebmasterWorld.

One of the major flaws typical in this .htaccess code is that it does unnecessary "file-or-directory-exists" checking. This is typically due to incorrect RewriteCond order and to failure to explicitly exclude both the rewrite target path itself and the filetypes which should never be rewritten to a script (e.g. images, CSS, external JS files). A quick fix-up of the code can result in dramatic server performance improvements.

These threads can be found here using a site search for the name of the cart, plus "performance," "slow," or "file exists" and "RewriteCond", or similar well-targeted search phrases.

Jim

enigma1

5:52 pm on Apr 25, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Jim, as far I know the stock code of these carts doesn't use rewrite rules because they support different servers not just Apache. Of course due to the popularity of apache, many addons for friendly urls come with htaccess files and perhaps not well tested with multiple server configurations and application conditions.

However IMO even if the request is decoded by the htaccess and loads up a specific application file to do the url translation, that file is responsible to distinguish this difference. ie whether the request is in a database of previously stored URLs of if it represents a separate valid file in which case it should be loaded and processed at that moment.

So even if say there is a rule that decodes css files via a friendly url decoder, the script itself should see, the request is not in the database and therefore directly load it.

With folders is different but I would assume whoever integrates code understands the requirements for overrides of the rewrite engine inside the htaccess of the sub-folder.

g1smd

6:21 pm on Apr 25, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



The number of problems with these packages is huge, and have been covered multiple times before.

There are the usual redundant -f and -d checks, and other inefficient rules, but there are many more subtle and insidious problems.

Some allow the URL request for a particular page to include any category in the category part of the URL and still serve the same content. Others use parameterised URLs with multiple parameters that expose internal site workings and provoke multiple types of Duplicate Content. Others include extra text in the URL for SEO purposes and then fail to validate that the text is the one valid piece of text for that particular page, the most famous example being:
www.amazon.com/Snape-Kills-Dumbledore-on-page-606/dp/0439785960/

The list goes on and on, with many designers seemingly being unaware that URLs are a reference system used only "out there on the web" and that URLs in no way need to reflect the filenames and parameters used "here, inside the server". Although the move to using a front controller to run the site has increased site security in several ways, it has also made those same sites more vulnerable in other ways.

enigma1

6:48 pm on Apr 25, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I have all osC stock versions going back to 2003. None of them enables the apache rewrite engine nor has any rewrite rules. They just don't have it. So we are talking about some addons. You can always see for yourself.

The issue with category or product you mentioned - that is part of the url, I am guessing you refer to something like:

product-p123.html

that is translated with many SEO contributions isn't it? And the old false impression many have, that it creates a duplicate content because the same page is also loaded if you enter:

my-product-p123.html
my-other-product-p123.html
..etc

Is that right? If so there is no duplicate.

In order to have a duplicate, the URL must be generated by the core code itself and be exposed in the page you view. And that is not the case here. What you're saying, is the same argument, as I can go to any site pretty much and just append a couple of parameters to the URL and then claim it's a duplicate, because the request returned 200. No that is not a duplicate. Just try it on this forum or anywhere you want.

This thread
[webmasterworld.com...]
it's not a duplicate because there is no code that generates it in the application. Well ok it maybe now, after the post if the forum picks it up as a URL.

Now URL poisoning is a different matter. That is serious and none of the discussions I have seen, talk about it. That has to do with the core code where some of these applications don't properly filter parameters, but that's different subject is not .htaccess or seo url related.