Page is a not externally linkable
mikomido - 4:42 pm on Jul 27, 2007 (gmt 0)
For some reason, I find it very hard to believe that Google etc. would do this. Sounds illegal and fraudulent etc. Also, random elements and advertisements and all sorts of data could be different by nature between these requests, making it very dangerous to deem any two pages of the same URL even within the same minute an attempt to "cloak". But how would they be able to detect cloaking otherwise? FYI, I do cloaking (isn't it obvious?). But I only do it for things that should only be interesting to bots, such as certain META data and misc. other things that browsers never need. I don't see anything bad in this; I'm just trying to save bandwidth by not sending useless info to all clients. So... can you shed any light on this? Also, while I'm asking, why didn't HTTP include an "Is-robot" header? It would make things much easier. As it is now, we have to guess whether a request is from a robot or a human based on the User-agent. I currently do this by sniffing for "*bot*" and "*crawl*", but it's far from perfect, of course. Even if it's not part of HTTP, why don't nice (non-malicious) bots send "Is-bot: true" or something? Would help me and others a lot to save bandwidth.
How do search engines check cloaking? The only way this can be done is for the search engines' bots to crawl all pages TWICE regularly (instead of once), with a faked User-agent header (corresponding to MSIE 6) one of the times, and then compare the two pages and see if anything differs.