Forum Moderators: phranque
http://avplay.example.com/The Dirty Dozen Blu-ray Review/8741.html
to be rewritten to
http://avplay.example.com.com/index.php?showreview=8741
How would I do this, please?
Thanks in advance
[edited by: jdMorgan at 12:55 am (utc) on Feb. 11, 2008]
[edit reason] example.com [/edit]
As for the rewrite, this question comes up almost every day. Have a look at some earlier threads here for some pointers, as well as looking at the sticky posts pinned at the top of the forum.
Your example is very dangerous because I guess that it will accept www.domain.com/any-randoms-words-i-care-to-insert-here/8741.html and that is a major Duplicate Content issue waiting to happen. You are almost inviting people to abuse your site, making bogus URLs indexible, unless your scripting also checks the requested URL against the actual title of the page and then generates a 404 error if it does not match. That's a couple of extra lines in your script, rather than in the .htaccess file.
The numeric bit at the end is the unique review ID which is the only bit of the URL which is used. The review title will be completely ignored and is for search engine use only.
I'll have a hunt around but I'm clueless at the ReWrite bit and I frankly would rather someone just told me what to put as I need to know this kind of stuff too rarely to make it worth learning it (I'm not lazy, just have a massive forum to run).
We could have anything in the url really. E.g. something like
http://avplay.example.com/the-dirty-dozen-blu-ray-review/showreview/1234.html
using the showreview bit to identify that we want to run the showreview script
and the 1234 numeric bit is the unique review id.
So it gets translated to
http://avplay.example.com/index.php?showreview=1234
[edited by: Stuart_Wright at 9:23 am (utc) on Feb. 11, 2008]
[edited by: jdMorgan at 3:49 pm (utc) on Feb. 11, 2008]
[edit reason] No URLs, please. Please see Terms of Service. [/edit]
Such "I can't be bothered" statements don't exactly motivate members to help you here.
And if you like vBulletin, compare a vBulletin forum with this one on dial-up. Now try it on a PDA or cell phone. Be sure you do these test while in Europe, or in a region where you must pay by the kilobyte, too. This site is fast and efficient, and forgoes the kind of 'fluff' that bloats up other forums.
The rule you seek is trivial.
RewriteRule ^[^/]+/([0-9]+)\.html$ /index.php?showreview=$1 [L]
Jim
[edit] Amended reference to duplicate-content warning. [/edit]
[edited by: jdMorgan at 10:21 pm (utc) on Feb. 11, 2008]
Perhaps you could explain this duplicate content threat you refer to as I don't understand it.
I'll go look up 'canonical' in a dictionary now as it's not a term I am familiar with.
As I say, the text part of the link is irrelevant and it is the numeric code prior to the '.html' which is the important bit.
I'm baffled that the text part of the link is at all relevant and that someone could or would want to drive our site out of SERPs (another term I'll go look up).
Your help is appreciated. Thank you.
[edited by: Stuart_Wright at 6:52 pm (utc) on Feb. 11, 2008]
> the text part of the link is irrelevant.
It's irrelevant to you, but not to search engines... Let's say I'm a competitor, and I want to cause you grief. I can link to your /<irrelevant>/<idNumber>.html URLs like this:
example.com/real-junk-on-this-site/1234.html
example.com/more-defective-products/1234.html
example.com/OSHA-recalled-products/1234.html
example.com/major-online-scammers/1234.html
Any and all of those links will return the "showreview=1234" page. So not only do you get the "benefit" of all those "nice" keywords in the linked URL, you also get the same content showing up at all four of them -- And do realize that since the first part of the URL-path is ignored, the combinations for potentially-malicious links (or even just misspelled URLs) are endless.
Now Google hates duplicate content; They don't want the same content to appear in their index under more than one "canonical" URL. They will remove all duplicate URL listings, and *they* get to pick which ones they remove, taking the choice out of your control. They *are* influenced however by incoming links and all other rank-weighting factors, so the "control" may actually go to your malicious competitor if he tries hard enough. This is, as we say in the business, "non-optimal."
Therefore, I recommend you check these 'irrelevant' URL-path-parts in your script, because they are in fact quite important.
...
If you do URL rewriting from a /subdirectory/pageID form to a /page?names=args form, then the browser resolves all page-relative included-object links on the page by using the domain/directory/subdirectory/ path currently indicated in its address bar, and appending the relative link. Search engine spiders use the same rule, although they don't have an address bar per se.
Therefore a link such as
<img src="images/logo.gif"> appearing on your page at example.com/blue-ray/1234.html will be requested from the canonical URL example.com/blue-ray/images/logo.gif -- Probably not what you want.
The solution is to use server-relative links, such as
<img src="/images/logo.gif">
or canonical links, such as
<img src="http://example.com/images/logo.gif">
I've used the term "canonical" in several ways here. Like many words, the intended meaning or "focus" varies according to context. But as used here, the meaning revolves around the concepts of "right and usual," "complete," "orthodox," or "conforming to the rules."
So a canonical URL is one that is the "one and only" URL that can be used to reach any given content.
A canonical URL may also mean the "full, formal URL" as in http://www.example.com/blue-ray/1234.html
Definition [google.com]
[added] SERPs: Search Engine Results Pages [/added]
Jim
[edited by: jdMorgan at 7:23 pm (utc) on Feb. 11, 2008]
But I guess I'm going to need a $2?
One last favour - how do you pass the contents between one pair of slashes as $1 (the text element) and the number before the '.html' (the review id) as the $2?
All my scripts use canonical links, so I'm safe there.
Many thanks for your generous advice.
Jim