Forum Moderators: phranque

Message Too Old, No Replies

Does the Mozilla Browser Have Site Capture Capability?

         

BlueSky

8:42 pm on Sep 27, 2003 (gmt 0)

10+ Year Member



I don't use the Mozilla browser except on periodic occasions to see how a webpage looks in it nor do I have the latest version. I'd like to know from a regular user whether it has any site capture capability or any plugin to give this.

I had a visitor come in from one of the search engines. He looked around a little and left. Then he came back still as a human but shortly afterwards his behavior turned robotic and started pulling page after page at least one per second. When he first came in, he was accessing pages via GET. After turning robotic, each page was accessed first by HEAD and then by GET. He wandered off into sections I had banned all bots and was repetively pulling the same pages. While I slept, he alone used the same amount of bandwidth that all others use in a three day period. I cut him off and banned him after waking up.

He used this id when he was a human and then as a robot: "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)" Looking at the logs, the way the behavior changed so suddenly it was as if a switch had been flipped.

Appreciate if someone could tell me if Mozilla has a site capture capability and whether there are any already written scripts to stop or at least slow down such activity. If not, I guess I'll have to write one.

claus

8:59 pm on Sep 27, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



>> if Mozilla has a site capture capability

It's not a Mozilla browser, it's a bot dressed up as one. The tricky thing with such bots is that they can send an User-Agent string to the web site that looks like the ordinary Mozilla browsers.

>> any already written scripts to stop or at least slow down such activity

You could try the bad-bot script. It works by placing it at a location that you have disallowed to all in your robots.txt file, and then placing a link to that page using a gif or something else in an odd place, so that humans will not likely follow that link. Here's the thread with jdMorgans changes:

[webmasterworld.com...]

/claus

BlueSky

9:34 pm on Sep 27, 2003 (gmt 0)

10+ Year Member



Thanks for the link.

This guy though came to me via a search engine. When he was first looking around, pages were loading just like they do for regular humans. Besides the page's content, there are log entries for loading the graphics, javascript, and the stylesheet as he accessed a page. After he came back and his behavior turned robotic, then the logs changed to reflect only content and not the rest. He hit the image directory later.

I sent a complaint to his ISP. So, I'll see what happens with them. Maybe they'll tell me whether or not he was a human at first.

Giljorak

1:23 am on Sep 28, 2003 (gmt 0)

10+ Year Member



I run Mozilla Firebird and there is a plugin you can get called Spiderzilla that will let you download complete websites. There are pther plugins you can get as well that allow you to change your user-agent as well.

adamas

8:18 am on Sep 29, 2003 (gmt 0)

10+ Year Member



It's not a Mozilla user agent. Its a (possibly faked) IE user agent.

BjarneDM

8:33 am on Sep 29, 2003 (gmt 0)

10+ Year Member



1) all Gecko-based browsers identify themselves with : Mozilla/5.0 like this
Mozilla/5.0 (Macintosh; U; PPC Mac OS X Mach-O; en-US; rv:1.5; MultiZilla v1.5.0.3a) Gecko/20030925
2) every present version of IE identifies itself as Mozilla/4.0
3) Mozilla has the capability to save pages one-by-one