|Questions regarding cloaking|
My basic approach to cloaking is to essentially present the spiders a scaled down version of my actual pages, sans the graphics and table structures. The content will be exactly the same, but on my spider food pages I'll also add links to my various doorway domains in a random fashion, to increase the inbound links and link popularity of my doorway domains (which are all related to my primary site).Every cloaked page will have a "real" equivalent. I'm assuming this is a sane approach (?)
My main concern is this...I plan on creating a cloaked version of each page specifically tweaked to each search engine, so in essence each page will have the "real" version, plus a cloaked equivalent for Google, one for Lycos, one for Altavista, etc.
I'm worried about the spiders finding these pages and perceiving them as duplicates-there will be minor differences, but the actual page size and content of each will be similar.
What are my options? Do I need to place each set of engine specific pages in their own directory, or do I need to include robot.txt files to keep the spiders from finding the dupes?
I'm confused! I'm hoping the cloaking gurus who hang out here will have some input on this matter.
>I'm assuming this is a sane approach (?)
Yes quite! But the links to the doorway domains could be a problem, depending on what the doorway domains contain (content and structure), I would probably opt to link the doorway domains to each other but not link from the main site to the DD's. Unless they are related and don't look like purely doorway domains, and don't duplicate content (especially all the same links) in that case linking to them is fine.
>I'm worried about the spiders finding these pages and perceiving
>them as duplicates...
I would not rely on robots.txt to keep spiders away, create separate directories and have the script retrieve the content from the appropriate directory for the spider making the request.
Sure doesn't sound like it :)
I do pretty much the same on several sites. The content is near identicle. The page titles, headers, and metas are different. I strip off the majority of the html and leave a skeleton page that isn't all that bad to the eye. I rarely worry about the engines viewing them as dupes. If the size and structure are radically different, most engines won't look at the actual content as the same.
Especially when you consider standard menus and footers that are present on the 'real' pages. If you have 20 links in menus/footers/page headers, then that is enough of a difference so that the engines won't spot it. The only engine that is even close to seeing that as a dupe would be Ink or Excite. I have recently gained an appreciation for excites dupe checker the hard way (they nuked three mirrors that were pretty close - different templates, but the templates had the same link text).
Also, make the filename very different. Don't try "foo.htm" and "foo2.htm" - do "foo.htm" and "bar.htm".
As far as dupe checking goes:
Google: nonexistent (throw dupes to Google at will - 100% identicle pages not being caught)
Alta: fair to poor. only compares html. (pages within 1-3% the same are found).
Fast: fair to good. only compares html. (pages within 1-5% are being found)
NL: fair, but nothing fancy. Only compares html, but is better than alta's.
Ink: real good. Stripped content comparison. 2-5% alike are found. (titles/urls/headers are enough to get passed it).
Excite: excellent. A very good comparison checker for stripped content. (Title/tags/url are throw out before the compare).
Air & Brett-
Thanks for the advice, it's quite valuable coming from the two of you. My approach to the doorway domains is to create content rich 10 page sites that all differ markedly from each other.
Each doorway domain and pages within are based upon derivations of a keyphrase, and ultimately lead to the main site-EG-
Many people seem to lament about the time involved with cloaking. For me it's much easier to create these scaled down cloaked versions of my real pages than it is to alter my existing pages to make them more attractive to the spiders. I have a good spider base and updating is a snap.
I don't do a great deal of cloaking. The main thing I do is strip away layers of html to get down to the core content. Se's love high text to html ratio's.