| 6:58 pm on Mar 30, 2001 (gmt 0)|
An interesting take on the issue. I sure don't know how they are going to spot cloaking if they can't spot a duplicate. They have all the power in their hands to control cloaking. I haven't seen anything that I would even call remotely a problem with cloaking on google. I thought they were "spam proof"?
| 1:05 am on Mar 31, 2001 (gmt 0)|
I wonder if Mike is related to Matt?
| 1:17 am on Mar 31, 2001 (gmt 0)|
Boy Google is really flip flopping on their stance in public statements regarding cloaking. In the last couple of months they have said cloaking was ok, even necessary at times, to we don't like cloaking (these two statements made days apart). Very strange ....
| 1:39 am on Mar 31, 2001 (gmt 0)|
If you are cloaking to google, I'd take the googlebot/noarchive tag out of your code at this point.
| 5:29 am on Mar 31, 2001 (gmt 0)|
they should know its a loosing battle, a well thought out script can and will detect and disable within few minutes any attempt at spotting cloaking by a search engine.
well, since they gave warning guess I should get it written now and be ready, nothing better then a new challenge to keep things fresh and exciting...
| 11:36 pm on Apr 1, 2001 (gmt 0)|
It's not a losing battle. All any search engine has to do to combat cloaking is rent the IP addresses of AOL occasionally for their spiders. Are you going to serve your cloaked pages to AOL all the time?
The only reason why cloaking works in the first place is that spiders use different IP addresses and user agents than people.
| 3:00 am on Apr 2, 2001 (gmt 0)|
Xoc, I don't agree, all spiders whether disguised or not follow patterns
all patterns can be recognized and dealt with on the fly approprietly.
The more sites/domains/subdomains/pages you have, the easier it is to recognize patterns followed by spiders.
Finally it is easier to build a defence/passive system (cloak) because you only have to somewhat reverse-engineer in response to the attacking/active system (spider)
aint this cool stuff....
| 3:19 am on Apr 2, 2001 (gmt 0)|
What you are saying is that a web spider isn't smart enough to act human. Sorry, but this is a Turing Test, and as Turing Tests go, this is a very easy one. With proper programming, a web spider can be indistinguishable from a normal human browser. They don't have to be all the time. Just often enough to catch cloakers some of the time. Then it is a game on whether you are going to be lucky this week. Just because web spiders haven't up until now, doesn't mean that this isn't coming.
| 5:37 am on Apr 2, 2001 (gmt 0)|
Just to throw something into the mix...
I don't think we will ever see *llegitimate* spiders faking a referrer. The MediaMetrix types, and others would blow a gasket if that were to happen.
So I guess the question is not wheather or not a bot could look like a human, but rather would the SE's do what it would take to truely emulate a surfer.
| 5:45 am on Apr 2, 2001 (gmt 0)|
I think it is only a matter of time. Sure, for a spider that is indexing, I think they will always give a legit user-agent field and hit robots.txt. But for a spider that is trying to catch cloaking, I think they will imitate a human on a non-indexing-spider IP address in such a way that you won't be able to tell. Some random time after the spider visits, you'll just quietly get banned.
If I were doing a search engine, I'd be working on this right now. To begin with, I'd have human review of the to-be-banned sites. But after perfecting the cloaking-spider, I'd just let it run.
| 5:54 am on Apr 2, 2001 (gmt 0)|
Plus, I just thought of another thing they could do. They could create a proxy server for the top 10 search engine results for any keyword combination. Hits on those would go through their proxy server using the same IP addresses that they index on instead of directly. So if you actually got into the top 10 for any keyword combination, the end-user would see exactly what the spider sees. This would be similar to the Google cache, but updated much more frequently. This would effectively kill improper cloaking on that search engine.
| 6:02 am on Apr 2, 2001 (gmt 0)|
Ok, well we are not really all that far apart. AV actually was doing much of this during the hardcore "Men in Black" days. There are some ways to minimize this type of risk. It wouldn't be in the interest of the SEO community to publicly discuss them, but there are strategic models to minimize the chance of a rouge spider slipping through the cracks.
|SEO is my Bag Baby|
| 2:49 pm on Apr 2, 2001 (gmt 0)|
Another quote from Matt Cutts (Google software engineer):
"...I will say that the google way is to try to do everything automatically...Any time you have a person in the loop is slows things down..."
So it looks like they will indeed be setting up an automatic "seek and destroy" machine. At this point I wouldn't put it past them to make a 'tricky' 'smart' 'human-like' spider.
Sounds like they hired Dr. Evil! He's going to steal my mojo!
| 3:08 pm on Apr 2, 2001 (gmt 0)|
They will look for the "no cache tag" to find out who is cloaking and who is not...
| 6:21 pm on Apr 2, 2001 (gmt 0)|
They don't need to look at the "no cache" tag. The "no cache" tag only stops them from posting the cached page. It doesn't stop them from caching the page.
I expect them to randomly select 1% of the pages in their database and check them for cloaking each month. No referrers. No multiple hits on the same site. No walking links. Just a one-page access within a minute of an indexing-spider hitting the page coming in from a random AOL IP address. The AOL hit would come before the indexing-spider hit.
Rapidly changing dynamic content would get penalized, but maybe that isn't the best thing for a spider to index, anyway. If it can't find the same words that it indexed when it comes in from AOL, then that page probably shouldn't be in the index anyway. They might allow some kind of markup, say a <div class="dynamic">, tag around some parts of the page that wouldn't get indexed at all for dynamic content.
Sorry if this makes you mad. But a SE is not in the business of making it easy to get good results for a specific site. They are in the business of providing relevant results to someone using their engine. If they have to penalize some legit sites to provide relevant results, so be it. There is an easy way around the issue: don't cloak to that search engine.
| 10:54 pm on Apr 2, 2001 (gmt 0)|
Xoc, no self respecting SE guy should ever get mad at SE engineers, they do their job we do ours, mind you, this is the game for folk who are patient and who don't get emotional (hmm, kinda like stock market investing)
Like littleman said no need to reveal to much, from your perspective its easy for SE to defeat cloaking, from my perspective its easy to defend against any such attempts.
Truth as always is probably somewhere inbetween.