Welcome to WebmasterWorld Guest from 23.20.223.88

Forum Moderators: goodroi

Message Too Old, No Replies

Google CHTML Proxy disobeying robots.txt

Why?

     
1:35 am on Aug 31, 2004 (gmt 0)

10+ Year Member



I set up a bad bot trap to disallow spiders that don't care about my robots.txt file, and to my surprise I caught Google CHTML Proxy/1.0 from IP 216.239.39.5 (which is a Google IP).

The robots.txt file has been unchanged for weeks, definitely enough time for any willing spider to read it. Any ideas why it's still being ignored by this particular bot?

1:51 am on Aug 31, 2004 (gmt 0)

10+ Year Member



A proxy doesn't have to respect robots.txt because it's not a spider. There's a person behind it. Check out

[webmasterworld.com...]

Pete

2:14 am on Aug 31, 2004 (gmt 0)

10+ Year Member



Thanks Pete, still learning something new every day. :)

I'm still a little bit confused, though. According to that information, Google CHTML Proxy shouldn't be sniffing around in hidden directories that are only accessible via hidden links. (If it's just following human clicks, why does it follow a link that cannot be seen by humans?) Hm...

2:35 am on Aug 31, 2004 (gmt 0)

WebmasterWorld Senior Member jimbeetle is a WebmasterWorld Top Contributor of All Time 10+ Year Member



This WW thread might help:

[webmasterworld.com...]

And I'm just going to suppose here, so: Suppose what's displayed in a chtml-capable browser is different than a "normal" html browser. Might it be that those "hidden" links aren't so hidden.

Just specualtion.

12:16 pm on Aug 31, 2004 (gmt 0)

10+ Year Member



Ah ok, thanks - now it makes sense.
 

Featured Threads

Hot Threads This Week

Hot Threads This Month