Forum Moderators: DixonJones
Given the prevalance of automated visitors these days who do little but confuse us with spurious web logs, am I safe to consider a visit that begins with a robots.txt hit, that of a 'bot' of some description, regardless of what it then appears to do or what the user-agent is?
Clearly the above excludes the curious web site rambling of the professional human webmaster, but then they don't count as customers anyway ;-)
Any thoughts people? Should I add it to my filter?
Steve
User-agent: Named-bot
Disallow: /
Its the ones that don't request the robots.txt and spider your site that are the pests.
Are there circumstances where a normal web surfer would generate a hit on the robots.txt file?
Normal web surfer? Probably not. Experienced webmaster? Yes. Someone looking to hack you? Possibly.
Be careful what you exclude in your robots.txt file. If security is involved, it should be in a password protected folder with no mention of it in the robots.txt file.
Hmmm, did I answer your question? ;)
If I didn't, keyplyr did.
Are there circumstances where a normal web surfer would generate a hit on the robots.txt file?
AFAIK yes, the "make available offline" function of IE (post 5.5) checks robots.txt:
2004-02-24 20:20:28 80.218.91.124 - XXX.YY.NN.MMM 80 GET /robots.txt - 404 Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.1.4322; MSIECrawler)
And pageoneresults I am keen not to stop them spidering as it is very interesting seeing what is going on on the site. Of course I am beginning to realise there are some agents that I should definitely exclude by using the robots.txt file, as they are patently up to no good.
Thanks all.
Steve