homepage Welcome to WebmasterWorld Guest from 204.236.254.124
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

    
FAST-WebCrawler/2.2.5 - Lycos/Alltheweb/Fast
Something very fishy about this one
jdMorgan




msg:400428
 9:37 pm on Mar 3, 2006 (gmt 0)

205.234.253.123 - - [03/Mar/2006:04:23:15 -0600] "GET /some_page.html HTTP/1.1" 403 1492 "-" "FAST-WebCrawler/2.2.5 - Lycos/Alltheweb/Fast"

The IP address resolves only as far as HostForWeb Inc, in Chicago, Il. Digging deeper, you get only

Asking ns1.scservers.com. for 123.253.234.205.in-addr.arpa PTR record: Reports unknown.ord.scnet.net. [from 66.225.250.250]
where scnet.net no longer resolves.

As I understood/remember it, Fast divested AllTheWeb to concentrate on enterprise search solutions a few years ago. And I'm not aware of any relationship at all between Lycos in MA, and any other company in IL.

Since it didn't fetch robots.txt, it didn't get anywhere, but keep an eye out for this one.

If it's legitimate, then it needs two emergency repairs: Fetch and obey robots.txt, and provide valid contact info in the UA string.

Jim

 

vortech




msg:400429
 6:52 am on Mar 4, 2006 (gmt 0)

I've been seeing this bot in my logs for a little while now. At first it came with a user agent of just random letters such as:l2axlqwvk2bguksmie+vbsvgaomkxbl - -

Then it started using this:
FAST-WebCrawler/2.2.5+-+Lycos/Alltheweb/Fast - -

So far it has come from:

72.2.24.#*$!
85.13.206.#*$!
83.142.29.xxx
205.234.253.xxx
66.148.68.xxx
72.232.67.xxx

It mostly uses the Fast UA now.

I can't confirm any of these IPs as Fast

I've blocked them all by IP since the random UA can't be used.

Anyone else seen these IP or UAs?

GaryK




msg:400430
 8:15 pm on Mar 5, 2006 (gmt 0)

I had a visit from this one last week. It did not request robots.txt and quickly fell into a spider trap. Sticky me if you want the last octet of the IP Address.

FAST-WebCrawler/2.2.5 - Lycos/Alltheweb/Fast
209.190.21.*

Partial WHOIS:

OrgName: Columbus Network Access Point, Inc.
OrgID: CNAP
Address: 50 W, Broad St, Suite 627
City: Columbus
StateProv: OH
PostalCode: 43215
Country: US
NetRange: 209.190.0.0 - 209.190.127.255
CIDR: 209.190.0.0/17
NetName: COLUMBUS-NAP
NetHandle: NET-209-190-0-0-1
Parent: NET-209-0-0-0-0
NetType: Direct Allocation
NameServer: NS1.NETSERVICE.THENAP.NET
NameServer: NS2.NETSERVICE.THENAP.NET
Comment: ADDRESSES WITHIN THIS BLOCK ARE NON-PORTABLE
RegDate: 1997-12-19
Updated: 2005-03-29

Pfui




msg:400431
 9:58 pm on Mar 5, 2006 (gmt 0)

FAST-related bots have been a plague (IMHO) for years. If I even see the word "FAST" in my logs, I practically start to twitch:)

Here's a mini assortment of UAs from my robots.txt, not that FAST reliably heeds them:

User-agent: FAST
User-agent: FAST Enterprise Crawler
User-agent: FAST-WebCrawler
User-agent: FAST MetaWeb Crawler
Disallow: /

Here are some older hit/hosts:

cr022r01-2.sac2.fastsearch.net - - [13/Oct/2002:11:20:21 -0700]
"FAST-WebCrawler/3.6 (atw-crawler at fast dot no; [fast.no...]

cr022r01-3.sac2.fastsearch.net - - [21/Jan/2004:10:40:32 -0800]
"FAST-WebCrawler/3.8 (crawler at trd dot overture dot com; [alltheweb.com...]

cr022r01-3.sac.overture.com - - [04/Apr/2004:15:06:38 -0700]
"FAST-WebCrawler/3.8 (crawler at trd dot overture dot com; [alltheweb.com...]

And here are a couple of the newest:

sch-fast-se-crawl01.dev.osl.basefarm.net - - [01/Mar/2006:00:58:31 -0800]
"GET /robots.txt HTTP/1.1"
"FAST Enterprise Crawler 6 used by Schibsted Sok (webcrawl@schibstedsok.no)"

216.255.229.241 - - [01/Mar/2006:08:31:28 -0800]
"GET /robots.txt HTTP/1.1"
"FAST Enterprise Crawler 6 used by FAST (iverjor (at) fast.no)"
(Eight minutes later, this one hit my homepage. Grrr...)

Over the years, FAST IPs have tracked back to Norway, and Massachusetts, as I recall, and goodness knows where else. This time around, "216.255.229.241" hails from Tokyo, Japan.

Nowadays, the minute I see a FAST IP, if it's getting 403'd for not reading/heeding robots.txt, I block it in the firewall. An overreaction, perhaps, but too many FAST-running individuals/companies have scraped the paint off the walls too many times.

Pfui




msg:400432
 10:03 pm on Mar 5, 2006 (gmt 0)

P.S.

On a related note, here's one more:

dnaspider04.mia.lycos.com - - [04/Feb/2006:18:16:02 -0800]
"GET /robots.txt HTTP/1.0"
"Lycos_Spider_(modspider)"

Heeded:

User-agent: Lycos
User-agent: Lycos_Spider_(T-Rex)
User-agent: Lycos_Spider_(modspider)

jdMorgan




msg:400433
 10:42 pm on Mar 5, 2006 (gmt 0)

Fast is a legitimate search company, previously powering the "AllTheWeb" search site -- actually a very good search engine -- but having now sold it and moved back into enterprise search exclusively. The Fast requests from .se domains are likely to be legitimate.

But I suspect the user-agent in the title of this thread is a spoof.

Jim

vortech




msg:400434
 12:37 am on Mar 6, 2006 (gmt 0)

Thanks for the info.

I also can confirm 209.190.21.*

Blocked by IP.

vortech




msg:400435
 12:48 am on Mar 7, 2006 (gmt 0)

200.68.65.*

FAST-WebCrawler/2.2.5+-+Lycos/Alltheweb/Fast - -

IPLANISP.COM.AR

This has to be a distributed spider or maybe a spammers group?

volatilegx




msg:400436
 2:09 pm on Mar 7, 2006 (gmt 0)

Fast sells enterprise searching software. Perhaps the spider is part of their deal? This spider could be coming from anywhere/everywhere.

jdMorgan




msg:400437
 2:27 pm on Mar 7, 2006 (gmt 0)

Might be a 'bot sold by Fast, but...

> Since it didn't fetch robots.txt

Never had that problem with Fast themselves.

Jim

volatilegx




msg:400438
 2:53 pm on Mar 15, 2006 (gmt 0)

Here's an interesting tidbit...

70.42.51.10 "FAST MetaWeb Crawler (helpdesk at fastsearch dot com)" sends an accurate HTTP_REFERER header, unlike most spiders.

The IP is registered to Fast Search and Transfer.

adb64




msg:400439
 12:18 am on Mar 16, 2006 (gmt 0)

Saw this one a few times back in February, but the past few days it is spidering my site heavily. It now uses some weird random UA names like "pynbjmxjcStcooccwSyd", "srhpdbdsgqsygi7ctsxpmwpdhqhmgwkfieeiwsy" or "dohdegijvi3GyxnirfqGfcwjvgodgghex3jict" and many more.
I just blocked the IP range 205.234.253.*, that are the only ones I've seen. The other IP ranges mentioned before I haven't seen.

Pfui




msg:400440
 11:44 pm on Mar 19, 2006 (gmt 0)

For those of you keeping score at home, here's Yet Another FAST-related bot variation/name:

sch-fast-se-isearch02.dev.osl.basefarm.net
schibstedsokbot (compatible; Mozilla/5.0; MSIE 5.0; FAST FreshCrawler 6; +http://www.schibstedsok.no/bot/)
03/18 20:25:59 /robots.txt 200 -

It's not consistent re robots.txt -- sometimes that's all it hits, sometimes not at all -- which is why I block all FAST spawn.

.
P.S. to adb64

It's been my experience that nonsensical UAs typically aren't FAST-related, and perhaps that explains why the IPs didn't match for you. I'm not sure who/what is behind the nonsense -- could be individuals playing with their browsers or a browser extension, or some program covering its tracks. (I suspect an extension or program.) Here are some similar fake UAs I've seen recently:

m bm9nswptqddtxqtrfjfqwur
kknfrskhn cxydbj9fymyhklr
rpy edmsjvblflwdx0tsromet0n0v
mqjngxaksvvBhtdshvgwBdgf8tBvh

Those get blocked automatically, but if I see repeated hits from the same ISP, or the IP/host name tracks back to a server farm, I rewrite the host, too. FWIW

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved