Forum Moderators: open

Message Too Old, No Replies

Looksmart crawling with Mozilla/4.0 User-agent

Doesn't identify itself

         

jdMorgan

6:11 pm on Jun 9, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Silly robot using nondescript user-agent:

64.242.88.60 - - [09/Jun/2006:05:12:27 -0500] "GET / HTTP/1.0" 403 874 "-" "Mozilla/4.0"

IP resolves to sv-crawlfw4.looksmart.com

Did not fetch robots.txt

Considering that a spider is a rather critical thing, you'd think that companies wopuld learn...
1) Provide a meaningful User-agent
2) Fetch and respect robots.txt
3) Provide a link to your Webmaster help/info page
4) Include info on using robots.txt to control your specific UA on that page (give the required robots.txt agent string).

Now, this is probably a robot checking links in their directory -- My site was listed in the now-defunct Looksmart Zeal directory. And many directory-link-checkers don't fetch robots.txt, because they are not really 'crawling' the Web, they're simply checking alink they already have in their directory. And they usually only fetch the one page that they have listed in their directory -- they don't crawl your site. So I'd settle for only points 1,3, and 4 on the list above. But c'mon guys, at least put "Looksmart" in the UA string somewhere, so I don't have to make special IP-based provisions to 'allow' this generic (and usually troublesome) User-agent...

Jim

incrediBILL

6:55 pm on Jun 8, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Today I foung a hit from Looksmart that was very odd:

64.242.88.60
sv-crawlfw4.looksmart.com.
user agent is only "Mozilla/4.0"

I assume it's a crawler with that reverse DNS lookup, but it didn't hit robots.txt, nothing, just a single file.

Very odd.

Anyone else see this?

My historical archive shows the following other agents from their range of IPs:

64.242.88.60 "NutchCVS/0.05 (Nutch; [nutch.org...] nutch-agent@lists.sourceforge.net):

64.242.88.60 "Mozilla/4.0 compatible ZyBorg/1.0 (wn-14.zyborg@looksmart.net; [WISEnutbot.com)"...]

64.242.88.50 "Mozilla/4.0 compatible ZyBorg/1.0 Dead Link Checker (wn.dlc@looksmart.net; [WISEnutbot.com)"...]

jdMorgan

6:51 pm on Jun 9, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Yes, your report and mine are essentially identical.

It's likely a directory-link-checker, so it doesn't fetch robots.txt (because it's not crawling your site, just checking a link it already has) and it only fetches the one URL that Looksmart has listed in their directory.

Therefore, my objection is simply that the generic "Mozilla/4.0" user-agent is usually trouble; It's used by a lot of content-scrapers. So, I'd be happier if they'd do a better job of identifying themselves as "Looksmart Directory" or something, instead of using that non-specific user-agent.

Zyborg and WiseNut are generally well-behaved on my sites, it's just the anonymity of this new UA that's a problem.

Jim

bobothecat

7:30 pm on Jun 9, 2006 (gmt 0)



Therefore, my objection is simply that the generic "Mozilla/4.0" user-agent is usually trouble; It's used by a lot of content-scrapers. So, I'd be happier if they'd do a better job of identifying themselves as "Looksmart Directory" or something, instead of using that non-specific user-agent.

And Yahoo seems to be doing the same... :(

[webmasterworld.com...]

You'd think with the talent they employ (hopefully), they'd know this.