Forum Moderators: open

Message Too Old, No Replies

Mozilla/2.0 (compatible; Ask Jeeves)

With a referer?

         

bobriggs

4:46 am on Nov 27, 2001 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I've been seeing this one for the last few weeks -

IP's:
140.239.251.222 - Direct Hit Technologies
65.214.36.53 - Ask Jeeves, Inc.

UA: Mozilla/2.0 (compatible; Ask Jeeves)

They only get the contact and root pages.

I remember that's one of the direct hit spiders, however, it is sending a referer string (www.ask.com/). Thought this was strange. Anybody else?

berno

10:50 am on Nov 27, 2001 (gmt 0)

10+ Year Member



Me too:
From 140.239.251.221
and 65.214.36.51 with the same user agent
arrived from www.ask.com the 22
The first look for an inside page and the second for the index
????????

Will

1:40 pm on Nov 27, 2001 (gmt 0)



That's EZSpider - owned by Direct Hit, spiders for Ask Jeeves too. Been around for a while now.

bobriggs

2:16 pm on Nov 27, 2001 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Yes, I know about that one:
216.200.130.204 ezspider304.directhit.com

The point I'm making is that the first two are sending a string in HTTP_REFERER. And the ips seem to be new to my site also. The only other spider that I've seen that sends an HTTP_REFERER string is picsearch - that referer is always '-'.

Will

4:29 pm on Nov 27, 2001 (gmt 0)



Ah, I see...

Here are a few others that send specific HTTP_REFERRER strings:

Robozilla (dmoz.org)
Netcraft (the "server software stats" people - try saying that after a few shandies, netcraft.com)
TulipChain (ostermiller.org/tulipchain)
PingaLink (pingalink.com)

There are also some as-yet-unidentified bots which also frequently send referrers like "synd.looksmart.co.uk".

Most of the big search engines normally send a string in the HTTP_FROM field (including Google,Lycos,Inktomi,Altavista,Excite,AlltheWeb,NorthernLight).

That is, the bots that make themselves obvious do, in any case.

Will

4:32 pm on Nov 27, 2001 (gmt 0)



Oops, almost forgot. Here are some more verified IPs for the DirectHit/Ask spider:

64.55.148.37-9
64.55.148.43-5
64.55.148.50-4
140.239.251.230
207.204.132.233-4
208.178.104.55
209.67.252.197
209.67.252.199
209.67.252.211-6
216.34.121.18-9
216.34.121.31-4
216.34.121.67
216.34.121.100
216.200.130.20,26,77-9,85-9,200-8,242,244-6,248-9

Not all in active use at present as far as I know.

HTH

bobriggs

12:11 am on Nov 29, 2001 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Thanks...I know robozilla and netcraft, can't remember a referer, although off the top of my head I guess I remember that robozilla might send a dmoz.org. Only see it every 6 mos or 1 year. Last 2 I've never heard of. Have you seen the ask.com/ referer string in the Ask Jeeves/Direct Hit spider before.. That's really what I'm asking.

'Most of the big search engines normally send a string in the HTTP_FROM field (including Google,Lycos,Inktomi,Altavista,Excite,AlltheWeb,NorthernLight)'

HTTP_FROM I'm not tracking.
Exactly what string do they send? This could be useful, I'd like to log those differently from regular hits.

littleman

12:27 am on Nov 29, 2001 (gmt 0)



There is an extra 'R' in Ask's HTTP_REFERRER. It is a subtle difference, but will keep the header from being logged most of the time.

Will

10:48 am on Nov 29, 2001 (gmt 0)



thanks littleman - that's new to me, useful to know. Guess that's why I hadn't seen it before!

bob - typically, agents send either a URL or email address in the FROM header. According to the robot guidelines, it is intended as a means of contacting the operator, so I'd say an email address is probably more desirable (provided they bother to maintain it). For example, Excite has "spider@atext.com" and Google has both a URL and an email address.

Josk

2:22 pm on Nov 29, 2001 (gmt 0)

10+ Year Member



You can add 140.239.251.221, 140.239.251.223, and 140.239.251.224 to that ip list...

littleman

9:45 pm on Dec 6, 2001 (gmt 0)



Okay, it looks like Ask is sending out spiders with actual HTTP_REFERER without the extra R. This is a very bad thing to do, all over the net people are acquire bogus traffic.

bobriggs

10:09 pm on Dec 6, 2001 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I knew I couln't pick it up if it had 2R's because I'm only logging the actual refe(r)er.

What will happen to ask.com's PR on google? Some public logs will show it as an http link, including mine.

littleman

10:20 pm on Dec 6, 2001 (gmt 0)



In my not so humble opinion it is a really stupid/deceptive thing for a portal to do. The typical webmaster will think the traffic is genuine.