Forum Moderators: open
larbin is another, FrontPage as well.
With a little broadening of the category?
You might also consider the numerous link checkers as well.
Larbin doesn't qualify for this particular information thread because it's an actual crawler itself, not a programming library or command line tool used to make crawlers.
Larbin is an HTTP Web crawler with an easy interface that runs under Linux. It can fetch more than 5 million pages a day on a standard PC (with a good network).
We may do another thread later about opensource and commercially available crawlers and such since larbin, nutch, heritrix, etc. for OpenSource and the google appliance and a bunch of offline readers and other stuff for commercial.
Please define a "default UA"
:-)
or left at default settings by hard working folks developing useful web applications that you will be blocking too.
:-))