Forum Moderators: phranque

Message Too Old, No Replies

Keywords in rewritten URLs

         

erichazann

2:55 pm on May 25, 2007 (gmt 0)

10+ Year Member



I am using rules like this to rewrite galleries of images. The first rule exists because my script (category.php) processes those params (1/0) as the directory index, and all others as 'detailed' views. I am putting a descriptive title in the urls that I use as links, ie... links look like:

http://www.example.com/category/3/10/keyword-description-of-the-image.html

The title/desc is pulled from a DB and is entirely superfluous other than being there for SEO purposes. All the images could be accessed as www.example.com/category/3/10/ as well.

RewriteRule ^category/?$ category.php?catid=1&prodid=0 [L]
RewriteRule ^category/([0-9]+)/([0-9]+)/?([^/]+)?/?$ category.php?catid=$1&prodid=$2 [L]

I haven't written it more specifically as ^category/([0-9]+)/([0-9]+)/?([^/.]+\.html)?/?$ because I may change the structure to http://www.example.com/category/3/10/keyword-description-of-the-image/ etc..

I don't have access to use RewriteMap, and plus, I will change the keyword descriptions from time to time. My questions are:

Is this a bad format? Since the description is just "thrown away" in the rewrite, could people maliciously link to http://www.example.com/category/3/10/any-old-thing-they-want.html and affect my PR, and cause duplicate content flags to be raised?

If I change the keyword descriptions, and thus the "filenames" I am linking to, should I add 301 redirects for all the old urls if they have been indexed, even though the rewrite is taking care of this. (since the only info that is really needed is the catid and prodid, the filename (.html) is junk).

I'm not keyword spamming in the URLs, I am just adding a little bit extra SEO, but don't want to shoot myself in the foot. I also don't intend to constantly change the keywords, but right now, a lot of them need refinement and it will be an ongoing process for a while.

Feel free to slap me and tell me what is the best practice for what I need to do.

[edited by: jdMorgan at 8:26 pm (utc) on May 25, 2007]
[edit reason] example.com [/edit]

jdMorgan

8:48 pm on May 25, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



You could disambiguate the second pattern just a bit by changing the third sub-pattern:

RewriteRule ^category/([0-9]+)/([0-9]+)[b](/[^/]+)[/b]?/?$ category.php?catid=$1&prodid=$2 [L]

... a tiny efficiency yield, and should have no impact on the matched and unmatched URL-classes.

Is this a bad format? Since the description is just "thrown away" in the rewrite, could people maliciously link to http://www.example.com/category/3/10/any-old-thing-they-want.html and affect my PR, and cause duplicate content flags to be raised?

Yes, people could maliciously link these.

If I change the keyword descriptions, and thus the "filenames" I am linking to, should I add 301 redirects for all the old urls if they have been indexed, even though the rewrite is taking care of this. (since the only info that is really needed is the catid and prodid, the filename (.html) is junk).

Yes, you should 301 the old URL to the new one, and remove the internal rewrite for the old URL(s). Whether URLs are internally rewritten does not enter into the question of whether to 301 URLs that have been replaced.

I'm not keyword spamming in the URLs, I am just adding a little bit extra SEO, but don't want to shoot myself in the foot. I also don't intend to constantly change the keywords, but right now, a lot of them need refinement and it will be an ongoing process for a while.

Balance this against the risk of malicious linking for dup-content creation, based on your market-area's competitiveness. If possible, apply a taxonomy to your URLs -- classify them into larger groups and put the highest-level group name first:
widgets-round-blue-fuzzy
widgets-cubic-red-smooth
wodgets-round-blue-fuzzy
...
misc-ratchet-metal

This may allow you to reject some spurious URLs as invalid, by checking the higher-level URL-path-parts, and rejecting those that don't actually exist -- For example, in the list above, "wickets" is not present, so a a request for a URL starting with "wickets" can be rejected.

Some sites URLs can be organized/categorized/classified easily, and some can't -- If the list of acceptable URL-parts gets too long, then it becomes hard to maintain, and checking it takes too much time. So again, balance the utility of this approach against the actual level of 'problematic linking' in your market sector. If you find that you must validate URLs, then either get a VPS hosting account that supports using RewriteMaps, or do the URL validation in the script itself -- You could always output a 301 or 403-Forbidden header from your script, as long as you can customize it; the "junk" part of the URL is still available in PATH_INFO.

Also, consider that SE's don't care whether you use "category" or "product" or "cat" and "prod," and the latter are shorter... Less dilution of the keywords following in the URL.

Jim