Forum Moderators: open

Message Too Old, No Replies

ScoutAbout

Any ideas?

         

themoff

9:24 am on Jul 13, 2001 (gmt 0)

10+ Year Member



Another UA that appeared in my logs - "ScoutAbout"
Came from 138.15.164.9 (zeus.nj.nec.com).
Did a google search on it, and found a homepage for it. Some kind of research tool, the description included "Capture complete Web pages for later viewing".

I'm still new to this side of web design - is there any reason to co-operate with any unidentifyable UA? i.e. have a list of specific UAs who are 'goodies', and assume anyone else is 'bad'?

Cheers, Robin

skirril

3:54 pm on Jul 14, 2001 (gmt 0)

10+ Year Member



Welcome to WmW, then.

Generally, not all robots are bad, and you need to coexist somehow with them, since they'll get you into the search engines index.

Ofc, there are things a "good robot" should do, which are imo:

1) retrieve, analyse and comply with robots.txt and the robots meta tag

2) In the UA, give some form of indentification or feedback possibility (email addy, website)

3) when retrieving pages, it shouldnt overload your server, spam you with requests. I consider it bad practice when robots come and do more than like 3 requests per 5 seconds.

To come back to ScoutAbout, which I also have in my logfiles:
- 10 requests over a period of roughly 2hrs
- has not retrieved robots.txt so far, mifght still do so given the pace it runs at.

So, no, I would not consider ScoutAbout a bad robot.

Skirril

Bolotomus

5:41 pm on Jul 15, 2001 (gmt 0)

10+ Year Member



How can you say ScoutAbout is a good robot, when it hasn't downloaded robots.txt???

Sure... it MIGHT download it later. But that's not the point! That's the *first* thing it's supposed to download, its way of asking to permission to spider your site.

If it's not grabbing robots.txt then I say it's a bad robot and the authors should get an email protesting it, and it would not be an idle threat to warn that it will be banned from our servers until it complies.

Bolot

Marcia

3:06 am on Jul 18, 2001 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Had a visit today from ScoutAbout:

138.15.164.9 - - [17/Jul/2001:11:46:04 -0400] "GET / HTTP/1.1" 200 5826 "-" "ScoutAbout"

No, it did not request robots.txt and there is no information about where to email. It only requested one page, so it's not particularly invasive (not yet, anyway), but it still does not meet Bolot's criteria as a friendly bot.

I know it's probably irrelevant, and I'm probably paranoid, but the last part of the URL has me wondering - nec.com I have a "Ready" computer put out by NEC. Naahh, couldn't be any connection.

volatilegx

8:07 pm on Mar 1, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I saw scoutabout today. It did request robots.txt. I still don't know who this UA belongs to.

wilderness

9:39 pm on Mar 1, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



<snip>I still don't know who this UA belongs to

[proactiveresearch.com...]
redirects to
[researchrepublic.com...]

jdMorgan

2:40 am on Oct 29, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Some observations to add to this old thread...

I believe that ScoutAbout and Lachesis may be working together - at least the ones that come from zeus.nj.nec.com and hades.nj.nec.com.

ScoutAbout requests robots.txt, and sometimes "/", and then Lachesis comes along later and requests "/" and other pages, but not robots.txt.

That's what I'm seeing anyway.

I had given Lachesis the boot with a 403, but it's so brain-dead, it just comes back and re-issues the request a few minutes later. I have added both Scoutabout and Lachesis to robots.txt, and I'll wait to see if they collectively obey robots.txt and then put the 403 back on (or redirect them back to themselves) if necessary.

Anybody else have more data to add to the pattern, or to break the pattern I'm seeing?

Jim