Forum Moderators: goodroi
I can see why they don't allow a lot of it, but I was wondering why they don't allow msnbot?
By "the one that you see" did you mean "the one that your browser might get"? Because browsers don't make requests for Robots.txt at all.
What type of server gives up different versions of the file for different requests/user-agents/spiders?
> What type of server gives up different versions of the file for different requests/user-agents/spiders?
Mine do. It's one way to cut bandwidth consumed by robots that don't understand multiple-user-agent records. Detect those UAs and serve them a simplified robots.txt with their UA string inserted. A combination of mod_rewrite and some simple cgi scripting on Apache can be used to do this easily.
jim_w,
Some "bad" robots are in fact spoofs of legitimate user-agents. In cases where the legitimate robot visits but is considered to be of no practical use to the site owner, it may be Disallowed in robots.txt. It is in fact necessary to take stronger measures for the spoofers, but having the robots.txt disallow helps identify the spoofers (because they don't fetch robots.txt, or they ignore the contents of robots.txt even though they do fetch it. So no, it's not entirely a waste of time.
Jim
[66.102.7.104...]
no cache, I wonder how google is getting to all the pages of webmasterworld.
[216.239.57.104...]
no page of webmasterworld has a cache but its getting indexed may be links from outside.
AjiNIMC