Forum Moderators: open

Message Too Old, No Replies

Default User Agents Discussion

         

wilderness

2:15 pm on Apr 14, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month




System: The following message was cut out of thread at: http://www.webmasterworld.com/search_engine_spiders/3626003.htm [webmasterworld.com] by incredibill - 11:32 am on April 14, 2008 (PST -8)


You fellas have done a fine job of inclusion.

larbin is another, FrontPage as well.

With a little broadening of the category?
You might also consider the numerous link checkers as well.

incrediBILL

7:37 pm on Apr 14, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Don,

Larbin doesn't qualify for this particular information thread because it's an actual crawler itself, not a programming library or command line tool used to make crawlers.

Larbin is an HTTP Web crawler with an easy interface that runs under Linux. It can fetch more than 5 million pages a day on a standard PC (with a good network).

We may do another thread later about opensource and commercially available crawlers and such since larbin, nutch, heritrix, etc. for OpenSource and the google appliance and a bunch of offline readers and other stuff for commercial.

incrediBILL

8:39 am on Apr 16, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Can anyone think of any other default UAs?

I'm drawing a blank now...

Mokita

11:23 am on Apr 16, 2008 (gmt 0)

10+ Year Member



Can anyone think of any other default UAs?

Please define a "default UA".

Hobbs

1:22 pm on Apr 16, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Please define a "default UA"

That would be the user agent appearing in your logs coming in from scripts (or development libraries) that crawl your pages whose user agent setting were set to the default settings by sloppy scrapers that you will block in htaccess.

:-)
or left at default settings by hard working folks developing useful web applications that you will be blocking too.

:-))

Mokita

1:32 pm on Apr 16, 2008 (gmt 0)

10+ Year Member



Like "grub" or "nutch"?

incrediBILL

7:01 pm on Apr 16, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Like "grub" or "nutch"?

Those will fit the next thread about default User Agents for off-the-shelf crawlers.

At the moment we're just looking to compile a list of default user agents of libraries and command line tools commonly used in spiders that don't identify themselves.