Welcome to WebmasterWorld Guest from

Forum Moderators: open

Message Too Old, No Replies

robot.txt beeing a php file

an efficient way to detect robots?



12:29 pm on Apr 17, 2007 (gmt 0)

5+ Year Member

Hi there, all is in the topics.
I think about setting the robot.txt file to actually beeing a php file. Through this i plan to add all IP address in an internal DB for non-yet-known indentified robots's IP.
This would had to been associated with usual rules of spider detection, through already known IP address, host and agent, The goal of this fourth check on the robot.txt's access is to detect a bot who would had passed though the first 3 test...

Do you think that the SE's hidden bots, those who can detect us doing cloacking (using none referenced ip, hidden agent and hidden host) will still access the robots.txt?

volatilegx explained he also used some additional check of his own to identify specifics SE behave, such as not accessing the css files etc...I also read somewhere else that SE must always access the robot.txt file, Do you think this method based on the robot.txt 's access could be confident?
I'm just afraid this would specificly some behave they would not reproduct when they tried to hide themself, when the goal of their visit is to compare result between an hidden bots and their official bots?

any opinions on this matter?



9:18 pm on Apr 18, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member

Welcome to WebmasterWorld, grant_green :)

It's not a good way to verify bots. Sneaky bots won't check the robots.txt file, and even regular bots won't check it every time they request a file.


9:20 am on Apr 20, 2007 (gmt 0)

5+ Year Member

Thanks for the tips Volatilegx! this is helpful. nice board BTW :)

Featured Threads

Hot Threads This Week

Hot Threads This Month