homepage Welcome to WebmasterWorld Guest from 50.17.86.12
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
Forum Library, Charter, Moderators: goodroi

Sitemaps, Meta Data, and robots.txt Forum

    
Google CHTML Proxy disobeying robots.txt
Why?
yosmc

10+ Year Member



 
Msg#: 439 posted 1:35 am on Aug 31, 2004 (gmt 0)

I set up a bad bot trap to disallow spiders that don't care about my robots.txt file, and to my surprise I caught Google CHTML Proxy/1.0 from IP 216.239.39.5 (which is a Google IP).

The robots.txt file has been unchanged for weeks, definitely enough time for any willing spider to read it. Any ideas why it's still being ignored by this particular bot?

 

PeterD

10+ Year Member



 
Msg#: 439 posted 1:51 am on Aug 31, 2004 (gmt 0)

A proxy doesn't have to respect robots.txt because it's not a spider. There's a person behind it. Check out

[webmasterworld.com...]

Pete

yosmc

10+ Year Member



 
Msg#: 439 posted 2:14 am on Aug 31, 2004 (gmt 0)

Thanks Pete, still learning something new every day. :)

I'm still a little bit confused, though. According to that information, Google CHTML Proxy shouldn't be sniffing around in hidden directories that are only accessible via hidden links. (If it's just following human clicks, why does it follow a link that cannot be seen by humans?) Hm...

jimbeetle

WebmasterWorld Senior Member jimbeetle us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 439 posted 2:35 am on Aug 31, 2004 (gmt 0)

This WW thread might help:

[webmasterworld.com...]

And I'm just going to suppose here, so: Suppose what's displayed in a chtml-capable browser is different than a "normal" html browser. Might it be that those "hidden" links aren't so hidden.

Just specualtion.

yosmc

10+ Year Member



 
Msg#: 439 posted 12:16 pm on Aug 31, 2004 (gmt 0)

Ah ok, thanks - now it makes sense.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved