Forum Moderators: phranque
I had a visitor come in from one of the search engines. He looked around a little and left. Then he came back still as a human but shortly afterwards his behavior turned robotic and started pulling page after page at least one per second. When he first came in, he was accessing pages via GET. After turning robotic, each page was accessed first by HEAD and then by GET. He wandered off into sections I had banned all bots and was repetively pulling the same pages. While I slept, he alone used the same amount of bandwidth that all others use in a three day period. I cut him off and banned him after waking up.
He used this id when he was a human and then as a robot: "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)" Looking at the logs, the way the behavior changed so suddenly it was as if a switch had been flipped.
Appreciate if someone could tell me if Mozilla has a site capture capability and whether there are any already written scripts to stop or at least slow down such activity. If not, I guess I'll have to write one.
It's not a Mozilla browser, it's a bot dressed up as one. The tricky thing with such bots is that they can send an User-Agent string to the web site that looks like the ordinary Mozilla browsers.
>> any already written scripts to stop or at least slow down such activity
You could try the bad-bot script. It works by placing it at a location that you have disallowed to all in your robots.txt file, and then placing a link to that page using a gif or something else in an odd place, so that humans will not likely follow that link. Here's the thread with jdMorgans changes:
[webmasterworld.com...]
/claus
This guy though came to me via a search engine. When he was first looking around, pages were loading just like they do for regular humans. Besides the page's content, there are log entries for loading the graphics, javascript, and the stylesheet as he accessed a page. After he came back and his behavior turned robotic, then the logs changed to reflect only content and not the rest. He hit the image directory later.
I sent a complaint to his ISP. So, I'll see what happens with them. Maybe they'll tell me whether or not he was a human at first.