Welcome to WebmasterWorld Guest from

Forum Moderators: goodroi

Message Too Old, No Replies

denying the fakers

9:55 pm on Jun 2, 2016 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
votes: 887

This will be a bit of rhetorical question, but ...

Assume for the sake of discussion that a robot, for reasons best known to itself, comes around with a fake humanoid UA--Firefox/11 or the like--but that this robot still wishes to obey robots.txt. (I did say it was a rhetorical question.) How would you identify said robot in robots.txt?

My first thought was "User-Agent: Mozilla" but it turns out that plenty of reputable* robots, up to and including the Googlebot, start their UA strings that way. At least one version of the mobile Googlebot calls itself Chrome. Would it work to say "Firefox" or possibly "Windows"? There do exist a handful of robots whose UA string says "Firefox" along with the robot name, but those are rare enough to handle on a case-by-case basis. Besides, I'm pretty sure I already block them.

Or is the very idea of a compliant liar so far-fetched that the answer must remain unknown and unknowable?

* I originally said "respectable", but decided that "reputable" allows more wiggle room.
11:05 pm on June 2, 2016 (gmt 0)

Administrator from US 

WebmasterWorld Administrator not2easy is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Dec 27, 2006
votes: 363

That would be nice. From what I see, it looks more like the humanoid visits that request robots.txt just want to see where to look for things. (hmm, what's disallowed?) The same humanoid UA/IP combos ignore the generic bot Disallow instructions and often change clothes between requests.

I mean, the time stamp shown for the IP/UA that requested the robots file is shown a second or two later requesting some page, a second or two later requests something with the same IP/different UA, then the original IP/UA is back again and usually wants the same page. Maybe 3-4 seconds start to finish. Bot net? Incompetent botrunner? Beats me, but I don't give them the benefit of the doubt. If a humanoid requests robots more than once they will need to ask for entrance. I think it is only fair. And like you, I find most of them are already blocked.