Forum Moderators: phranque

Message Too Old, No Replies

Duplicated content problem, but I need it : )

Duplicated content, mod rewrite, htaccess

         

punisa

12:19 pm on Feb 29, 2008 (gmt 0)

10+ Year Member



Hi,

On my site I have a section where I put various business listings.
URL for a specific business is as follows:
www.mysite.com/business/category/subcategory/location/companyname.html

example in htaccess:
RewriteRule ^([business]*)/([^/]*)/([^/]*)/([^/]*)/([^/]*)\.html$ /viewclient.php?companyname=$5 [L]

That is all pretty, buy I also wish to give my clients option to have short URL which they can put on their business cards like so:
www.mysite/companyname/

To do so I made this in .htaccess:
RewriteRule ^([john_mechanics]*)\/$ /viewclient.php?companyname=$1 [L]

BTW, I cant use
RewriteRule ^([^/]*)\/$ /viewclient.php?companyname=$1 [L]
cause I already use it for something else : )

But now we obviously have duplicated content on our hands : o

Couple of questions that puzzle me:
1) will all-mighty google punish me for this? (probably..)
2) is there a better way that I might solve this?
3) uhm, I'm a htaccess beginner really, is my code ok? : /

Thanks in advance ! : )

Quadrille

10:19 pm on Feb 29, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Google won't punish you, but obviously won't include two identical pages for long.

Why not simplify and put the content at www.mysite/companyname/ ?

I can't comment on the code.

LifeinAsia

10:22 pm on Feb 29, 2008 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Just redirect www.mysite/companyname/ to www.mysite.com/business/category/subcategory/location/companyname.html

jdMorgan

5:30 am on Mar 1, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Just be aware that this regular-expressions pattern:
^([john_mechanics]*)\/$
means "match any URL-path that contains any number (including zero) of any of the letters j, o, h, n, _, m, e, c, a, i, c, or s in any order.

That is most likely not what you want, and I would think that simply using
^(john_mechanics)\/$
is what you intended.

There is a short regular-expressions tutorial cited in our forum charter, and an even shorter one in the mod_rewrite documentation itself.

Jim

punisa

9:30 am on Mar 3, 2008 (gmt 0)

10+ Year Member



Hey Jim, thanks alot : ))

I was just about to ask why is it taking in account every character in a word and you solved it for me before I managed to start a new question : ) Great ! : ))

Anyway I changed everything to www.mysite.com/companyname/ and looks quite nice. Once again thanks everybody.

jdMorgan

9:00 pm on Mar 3, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I should also point out that unless you need to quantify a group of characters or a sub-expression, or back-reference it in the substitution, parentheses are not needed. Also, it's a bad idea to allow "nothing" in between slashes, since Apache treats multiple contiguous slashes as a single slash, thus exposing your rules to directory-level errors:

Therefore, this rule

 RewriteRule ^(business*)/([^/]*)/([^/]*)/([^/]*)/([^/]*)\.html$ /viewclient.php?companyname=$5 [L]

becomes either

 RewriteRule ^business/[^/]+/[^/]+/[^/]+/([^/]+)\.html$ /viewclient.php?companyname=$1 [L]

or

 RewriteRule ^business[^/]*/[^/]+/[^/]+/[^/]+/([^/]+)\.html$ /viewclient.php?companyname=$1 [L]

Depending on what you're trying to do with "business" -- Either match it exactly or allow for zero or more trailing characters.

Be aware that since the URL-path-parts matching your old $1 through $4 are irrelevant to the rewrite, it appears that you've created an opportunity here for duplicate-content problems; Each unique "page" on the Web should be accessible with one and only one URL. The code above seems to allow any "business" page to accessed at

example.com/business/<anything-at-all>/<anything-at-all>/<anything-at-all>/<business-name>.html

and that makes your site vulnerable to PageRank/Link-popularity dilution through malicious linking to "random" URLs. If someone points enough links at, for example,

example.com/business/that/cheats/customers/Acme.html

then the search engines may 'pick' that as the preferred search results listing URL for the real Acme business page on your site.

To avoid this, your rule or your viewclient.php script should probably be modified to check the validity of the entire URL-path.

Jim

punisa

8:05 am on Mar 7, 2008 (gmt 0)

10+ Year Member



Hello again,

once again you perfectly noticed another flaw that I have here : )
I am aware that someone might put *anything as a category type and still reach the specific business.

Obviously I though nobody would do such a thing cause there is no actual cilickable link that would lead there, but as you pointed out Jim, there is a lot of bad people out there : (

I have set up things this way to be compatible with my home brew CMS system, I guess I could set things up to dynamically add full links into htaccess for every company.

I'll get to it right away : )
Thanks again Jim !