Hey, it is him. I looked up his profile in the venue where I used to know him--from which he seems to have departed almost as long ago as I did--and it's got a link to that selfsame pbm dot com. And a photograph. How funny. (Now, what's really funny is how long I spent poring over photographs and biographical details before realizing that the pbm was staring me in the face all along. Never mind!)
Well, just for that I've poked a preemptive hole for the robot. It came by yesterday (the 10th), asked for robots.txt, got redirected* (wrong www, I guess), asked again in the right place, and only then asked for the front page--which was denied on header grounds. A lot of robots who get redirected when requesting robots.txt then go ahead and ask for the front page--at that same wrong hostname--before they get around to following-up the robots.txt redirect. Harrumph.
* On my personal site, which is https, robots.txt is exempt from redirection because some respectable robots seemed confused by the change. My “real” site has no such exemption, since it’s only a matter of with/without www and this doesn't seem to bother the robots.
Yes, it's me. My crawler is supposed to follow up to 5 robots.txt redirects first before asking for any pages. If you send me a bug report with more details I'll be happy to look into it -- brand new crawler, I haven't gotten any feedback on it from anyone external, yet.
Well, I can't say anything about the robot's behavior yet, because on the first visit it was blocked. I don't have one of those fancy robots.txt programs that looks at the visitor's name, checks whether it is authorized, and if not, generates a Disallow: line on the fly. So first-time visitors will not find a Disallow, except for specified directories; instead they'll be physically barred on header grounds unless and until I poke a hole.
Then again I may not be the most useful feedback-supplier, since I generally don't much care what a robot plans to do with the information it finds, unless it's a blatant top-to-bottom full-site scraper.