Forum Moderators: open

Message Too Old, No Replies

Would This be Duplicate Content?

An issue with robots.txt.

         

jake66

3:34 am on Nov 22, 2005 (gmt 0)

10+ Year Member



i have mod_rewrite enabled

in robots.txt i have both instances of the *.php and *.html pages listed for my pages i want to keep out of the results (contact, account, etc.)

do the search engines still peek at these pages and mark them as dupe's cause they're the same exact page... and then give the entire domain a dupe hit?

clearly, i have no idea how duplicate content is assigned. ;) if it's just as easy as page a having 1 url, page b having another url but same exact content... then i don't need further explanation, but i am still curious about the robots.txt issue

oddsod

4:39 pm on Nov 25, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



>> i have no idea how duplicate content is assigned.

The problem is that neither have they (or so it seems with Google sometimes). Some SEs are better at handling dup content than others and the mod_rewrite you put in probably was for Google as it seems to have the biggest problem with this.

Blocking in robots.txt means this: With Yahoo they undertake to not read disallowed pages (and won't list them either). Google will feel free to read them but won't list them in SERPs. In theory, you have nothing to worry if those dup pages are disallowed. In practise nothing is guaranteed with Google. They screwed up on the canonical issues and hijacking for instance, so can screw up again. And, one day if you make a small mistake in robots.txt you may find that it has all your dup pages in the supplemental index and is considering them as duplicates. There is no guaranteed appeal process.