Welcome to WebmasterWorld Guest from 220.127.116.11
Forum Moderators: open
My main concern is this...I plan on creating a cloaked version of each page specifically tweaked to each search engine, so in essence each page will have the "real" version, plus a cloaked equivalent for Google, one for Lycos, one for Altavista, etc.
I'm worried about the spiders finding these pages and perceiving them as duplicates-there will be minor differences, but the actual page size and content of each will be similar.
What are my options? Do I need to place each set of engine specific pages in their own directory, or do I need to include robot.txt files to keep the spiders from finding the dupes?
I'm confused! I'm hoping the cloaking gurus who hang out here will have some input on this matter.
Yes quite! But the links to the doorway domains could be a problem, depending on what the doorway domains contain (content and structure), I would probably opt to link the doorway domains to each other but not link from the main site to the DD's. Unless they are related and don't look like purely doorway domains, and don't duplicate content (especially all the same links) in that case linking to them is fine.
>I'm worried about the spiders finding these pages and perceiving
>them as duplicates...
I would not rely on robots.txt to keep spiders away, create separate directories and have the script retrieve the content from the appropriate directory for the spider making the request.
Sure doesn't sound like it :)
Especially when you consider standard menus and footers that are present on the 'real' pages. If you have 20 links in menus/footers/page headers, then that is enough of a difference so that the engines won't spot it. The only engine that is even close to seeing that as a dupe would be Ink or Excite. I have recently gained an appreciation for excites dupe checker the hard way (they nuked three mirrors that were pretty close - different templates, but the templates had the same link text).
Also, make the filename very different. Don't try "foo.htm" and "foo2.htm" - do "foo.htm" and "bar.htm".
As far as dupe checking goes:
Google: nonexistent (throw dupes to Google at will - 100% identicle pages not being caught)
Alta: fair to poor. only compares html. (pages within 1-3% the same are found).
Fast: fair to good. only compares html. (pages within 1-5% are being found)
NL: fair, but nothing fancy. Only compares html, but is better than alta's.
Ink: real good. Stripped content comparison. 2-5% alike are found. (titles/urls/headers are enough to get passed it).
Excite: excellent. A very good comparison checker for stripped content. (Title/tags/url are throw out before the compare).
Thanks for the advice, it's quite valuable coming from the two of you. My approach to the doorway domains is to create content rich 10 page sites that all differ markedly from each other.
Each doorway domain and pages within are based upon derivations of a keyphrase, and ultimately lead to the main site-EG-
Many people seem to lament about the time involved with cloaking. For me it's much easier to create these scaled down cloaked versions of my real pages than it is to alter my existing pages to make them more attractive to the spiders. I have a good spider base and updating is a snap.