| 11:59 pm on Feb 6, 2005 (gmt 0)|
Huh? Why would you want to keep Googlebot out if you're doing well with it? You don't want to bite the hand that feeds you.
| 12:04 am on Feb 7, 2005 (gmt 0)|
I probably didnt explain the question correctly.
I meant to say i am building a whole new page for MSN search and do not want google to index the page as it will be optimised for MSN.
How can i do this?
| 2:26 am on Feb 7, 2005 (gmt 0)|
You can either create a robots.txt or .htaccess based denial.
With robots.txt you let googlebot figure out that you don't want them to index (they might still do it!).
With .htaccess you can deny using a useragent (say anything that has "google" in the UA string) -- googlebot will not even get to the robots.txt.
| 6:54 am on Feb 7, 2005 (gmt 0)|
Google will find out. Do you really think they always identify themselves? How could they possibly stop cloaking?
| 7:03 am on Feb 7, 2005 (gmt 0)|
|How could they possibly stop cloaking? |
True, but Bek isn't talking about cloaking.
The objective is simply to exclude Googlebot alltogether - something which robots.txt is designed to do and Googlebot will no doubt honour. There is no intent do deceive.
| 7:57 am on Feb 7, 2005 (gmt 0)|
Well, first of all there *is* an intent to deceive. He is making two versions of a page, one for MSN, one for Google. This is cloaking without sharing a URL. If he really means a "page," it may well be doorway-making too.
Apart from mundane infelicities—you sacrifice links to one of these pages--I don't think it will work to keep Google out. I remember reading somewhere that Google doesn't promise not to visit pages it's excluded from; it only promises not to put them in the SERPs. That's how they keep people honest. After all, you could have a single website with three versions of every page at different URLs.
I'm sure this has been tried, and Google's caught it.
| 8:07 am on Feb 7, 2005 (gmt 0)|
> I remember reading somewhere that Google doesn't promise not to visit pages it's excluded from; it only promises not to put them in the SERPs.
I hope you didn't read that here, suidas!
The opposite is the case. If Google sees a link to a /robots.txt excluded URL, then Google will not fetch it. It can still list the URL in the results without having to fetch the page.
If a URL is not /robots.txt excluded, but has a META robots tag with 'noindex', then if Google fetches the URL it will not be listed.
> sure this has been tried, and Google's caught it
That someone has a Web site, and that there's some part of the Web that shouldn't be crawled? I don't see how Google would see a quality issue there, unless they didn't like the site they were crawling.
The 'dual site' approach using Robots Exclusion Protocol would sacrifice links though.
| 8:35 am on Feb 7, 2005 (gmt 0)|
But how important are links for a page to do well on MSN?
|The 'dual site' approach using Robots Exclusion Protocol would sacrifice links though. |