Forum Moderators: phranque

Message Too Old, No Replies

htaccess help for seo url's

htaccess help for seo url's

         

htchallenged

2:07 am on Nov 18, 2009 (gmt 0)

10+ Year Member



I have a php based site and have a piece of code to rename urls to the actuall name of categories and products so for example:
[mysite.com...]
would be:
[mysite.com...]

I have the php working correctly and get the desired results in the address bar now just need to resolve the htaccess rewrites...

The current type of rewrite I use sucessfully for non www to www looks like this.

RewriteEngine on
RewriteCond %{HTTP_HOST} ^mysite.com
RewriteRule ^(.*)$ [mysite.com...] [R=permanent,L]

The one I have for the url rewrite looks like this:

RewriteEngine on
RewriteBase /
RewriteRule [^/]+-c([0-9]+)/[^/]+-s([0-9]+)/[^/]+-p([0-9]+)/ /catalog/product_info.php?cPath=$1_$2&products_id=$3 [NC,L,QSA]
RewriteRule [^/]+-c([0-9]+)/[^/]+-p([0-9]+)/ /catalog/product_info.php?cPath=$1&products_id=$2 [NC,L,QSA]
RewriteRule [^/]+-c([0-9]+)/[^/]+-s([0-9]+)/ /catalog/index.php?cPath=$1_$2 [NC,L,QSA]
RewriteRule [^/]+-p([0-9]+)/ /catalog/product_info.php?products_id=$1 [NC,L,QSA]
RewriteRule [^/]+-c([0-9]+)/$ /catalog/index.php?cPath=$1 [NC,L,QSA]

The syntax does not look like the same type and may not even be for an apache server(hence it does not function correctly), my hope is that one of the gurus here may be able to help me format it to work on my hosts apache server.

Any help that you could offer to this htaccess newbe is beyond greatfully appreciated!

jdMorgan

2:55 am on Nov 18, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Here's a rule for your specific example -- a corrected version of the third rule in your 'stack' of rewrites above:

# Rewrite requests for /category-1-c21/product-1-s-s161/ to /index.php?cPath=21_161
RewriteRule ^category-1-c([0-9]+)/product-1-s-s([0-9]+)/$ /catalog/index.php?cPath=$1_$2 [QSA,L]

I suggest that, referring to the mod_rewrite and regular-expressions documentation cited in our Forum Charter, you take that code apart (one character at a time if necessary) until you understand how it works. Then you can modify your other rules based on your actual needs.

Using mod_rewrite without understanding it completely is a good way to kill your site and/or your business. The code you posted is quite badly coded, and it would be better to forgo URL-rewriting entirely and live with dynamic URLs than it would be to have a single typo or logical mistake in your .htaccess server configuration file.

I also hard-coded many of the substrings in the URL-path pattern and removed the [NoCase] flag in order to avoid creating 'duplicate content' problems; Allowing more than one single URL to access any given page's content is another good way to damage your site.

One more note: You'll also save yourself a whole lot of grief by testing only one or a few rules at a time, especially if they can interact. Try testing this one modified rule with only your domain canonicalization redirect rule; comment-out all the other rules. This will make it much simpler to determine cause from effect when testing.

Jim

g1smd

3:22 pm on Nov 18, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



... and flush the browser cache each time!

Also remember that "Live HTTP Headers" (for Mozilla browsers) is your friend.

smartwork

7:48 pm on Nov 18, 2009 (gmt 0)

10+ Year Member



If I may share this thread, I am having to go a little different direction as htchallenged. We are restructuring and needing to redirect.

Our current search-friendlies are in the format of

[mysite.com...] (121 is just an example ID for a category, but could be 121_140 format to include subcategories)

I need to redirect to this format which is already established by our new system:

[mysite.com...]

I can line-item these as there are not that many categories, but I may have to consider a more dynamic option for all our products.

Is this logic on target for our category redirects:

RewriteBase /
RewriteRule ^catalog/default\.php/category/Category%2BName/cPath/121$ /category-name-c-121.html [QSA,L]

g1smd

10:05 am on Nov 19, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



That code is for a rewrite.

It accepts a URL like example.com/catalog/default.php/category/Category%2BName/cPath/121 and looks on the server for a file called /category-name-c-121.html

I guess that isn't what you want.

So, not only is it exactly backwards, it doesn't redirect.

A redirect would also include the domain name in the target and [R=301,L].

smartwork

3:52 pm on Nov 19, 2009 (gmt 0)

10+ Year Member



I'm testing this, but I believe it is on more on target:

Rewritebase /
RewriteRule ^default\.php/category/.*/cPath/121$ [mysite.com...] [R=301,L]

g1smd

5:11 pm on Nov 19, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Beware of that .* construct. It is greedy promiscuous and ambiguous and slows the code down a lot.

smartwork

5:17 pm on Nov 19, 2009 (gmt 0)

10+ Year Member



I hesitated, but what I was running into is category names that could have any number of words (well... up to probably 4) and along with those words came spaces or pluses creating %20 or %2B which was causing quite an unpredictable pattern to try to match. Even when I was trying to escape the %2B during testing I was running into unsuccessful matching.

Do you have a better suggestion or direction to point me in?

g1smd

5:32 pm on Nov 19, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Use a negative match, like match until the next '/' or somesuch, or any such rule that can be evaluated left-to-right in just one pass.

smartwork

7:37 pm on Nov 19, 2009 (gmt 0)

10+ Year Member



Such as this?

Rewritebase /
RewriteRule ^default\.php/category/[^/]+/cPath/121$ [mysite.com...] [R=301,L]

htchallenged

11:42 pm on Nov 19, 2009 (gmt 0)

10+ Year Member



Thanks so much JD, you are obviously a guru!

I have the following working thanks to your assistance.
Options +FollowSymLinks
RewriteEngine on
RewriteBase /
RewriteRule ^category-a-c([0-9]+)/subcat-b-s-s([0-9]+)/$ /index.php?cPath=$1_$2 [QSA,L]
RewriteRule ^category-a-c([0-9]+)/$ /index.php?cPath=$1 [QSA,L]

(category-a and subcat-b are not the actuall names just stand in's for this post, the actuall cat and subcat names are taken from my html_output.php script and derived from actuall cat and subcat names)

If possible optimum would be to have the ability for the rewrite to work for any of the categories and sub cats using a wild card? to replace the "category-a" and "subcat-b" that way I would be able to access all the category pages and subcat pages on the site without creating a rule for every cat and subcat I have, because there are dozzens of cats and subcats.

jdMorgan

12:34 am on Nov 20, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



The main problem with that 'wildcard support' is to make sure that you won't get "collisions" with things that *are not* categories. For a simplistic example, you don't want to rewrite robots.txt or index.php itself, although that wouldn't/couldn't happen with a rule pattern as complex as the one we're discussing. However, you/we do need to nail down (as much as possible) *what* qualifies as a "category?" Number (min/max) of characters, character set -- alpahbetic (upper/lowercase?), numeric, any special characters (hypens, underscores?). The tighter this can be defined, the less risk of 'unexpected operation' of the rule...

Another thing to be aware of... If the 'category' and subcategory are treated as 'wildcards' in the rewriterule, then they need to be passed to, and validated by, your script. The validation can be done with a database call, checking the catname and subcatname against the database entry. If thre is no exact match, then a 404-Not Found response should be sent to the client.

If this is not done, then I, as Mr. Weaselly Sniveling Competitor, may elect to link to your category pages as <a href="http://htchallengedsdomain.com/cheap-shoddy-goods-that-fail-after-two-days-a-c21/product-1-s-s161/">Exquisitely crappy products from htchallenged.com</a>

If I do that, quess what terms will become associated with your pages? And as well all know, [i]everything you read on the interweb pipes is true, right?

Another 'trick' would be to link to that same page with bazillions of different URLs -- differing in that catname part of the URL. Then you've got a duplicate-content problem -- The same content appearing at more than one single solitary URL. And that's never a good thing...

If you cannot modify the script itself, then you may need to put it inside a new 'wrapper' script to validate the requested category and then invoke the 'main' script if it's OK.

Having done that, you could use
RewriteRule ^(([^/\-]-)*)a-c([0-9]+)/subcat-b-s-s([0-9]+)/$ /index.php?cPath=$3_$4&catname=$1 [QSA,L]

And then having extended that idea to have the script also validate the subcats,
RewriteRule ^(([^/\-]-)*)a-c([0-9]+)/(([^/\-]-)*)-b-s-s([0-9]+)/$ /index.php?cPath=$3_$6&catname=$1&subcat=$4 [QSA,L]

Note however, that I'm showing patterns that are a bit more generic than I'd like to use -- for the "collision" reason mentioned above, and also for efficiency's sake. Basically, anything but a hyphen or a slash matches those inner subpatterns, and there can be as many hyphens as you like in the catname and subcatname. So these subpatterns could likely be improved... The more specific, the better.

Jim