Forum Moderators: open

Message Too Old, No Replies

Name the Mystery Robot...

Why is Yahoo not indexing my subpages?

         

Rollo

3:46 am on Jul 8, 2004 (gmt 0)

10+ Year Member



I have a question for spider/robot experts. The below list is from my traffic stats in order of hit frequency. I'd like to know if anyone has any opinions about who the mysterious "crawl" (#2) is...

The reason I ask is that despite a recent listing in Yahoo directory, other than my index page, Yahoo hasn't indexed any of my subpages. I was wondering how long this could take and if "crawl" could be Yahoo.

If it is Yahoo, why are my subpages not being indexed if it is visiting my site?

Any wisdom would be greatly appreciated...

Googlebot (Google)
Unknown robot (identified by 'crawl')
Jeeves
LinkChecker
Inktomi Slurp
MSIECrawler
Road Runner: The ImageScape Robot
Alexa (IA Archiver)
Bumblebee (relevare.com)
BaiDuSpider
Unknown robot (identified by 'spider')
IBM_Planetwide
LinkWalker
larbin

Thanks!

Rollo

volatilegx

2:29 pm on Jul 8, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



The Slurp spider is Yahoo's.

Rollo

2:46 pm on Jul 8, 2004 (gmt 0)

10+ Year Member



Inktomi Slurp?

Do you know why it doesn't seem to be indexing my subpages? How long does the Slurp take to index an entire site? Mine has about 100 pages - all linked from the index page.

Thanks,

Rollo

volatilegx

6:08 pm on Jul 8, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I don't know... you might want to ask in the Yahoo forum [webmasterworld.com].

ncw164x

6:17 pm on Jul 8, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Do a search at google for "optimize for inktomi"

it may be one simple little thing you have overlooked

ncw164x

Rollo

10:45 pm on Jul 12, 2004 (gmt 0)

10+ Year Member



Does anyone have any idea who #2 "crawl" (above post) could be?

Thanks!

xcomm

11:49 am on Jul 13, 2004 (gmt 0)

10+ Year Member



Hi Rollo,

Looks like you are use awstats? I use it too, but did not think much about crawl till your post...

tail -100000 access_log ¦ grep "crawl"

comes out with this:

200 19369 "-" "Yahoo-MMCrawler/3.x (mms dash mmcrawler dash support at yahoo dash inc dot com)"

Looks like some MultiMediaSearch or Crawler from Yahoo. On my page it looked for pictures.

So thanks for bring this up to think about!
xcomm
BTW: Does someone know were at Yahoo the MMSearch is located (sorry I'M to much a Googler :-)?

fiestagirl

3:41 pm on Jul 13, 2004 (gmt 0)

10+ Year Member



The robot that you are seeing used to be the AltaVista image robot.
User-agent: vscooter

New UA: Yahoo-MMCrawler/3.x (mms dash mmcrawler dash support at yahoo dash inc dot com)
IP: 66.94.233.*

Rollo

12:52 am on Jul 15, 2004 (gmt 0)

10+ Year Member



Hey... thanks all!

Yeah, I love awstats. A very good program.

I was wondering what those were...

(Anyone know how is the new MSN search identified?)

lstrand

2:58 pm on Jul 26, 2004 (gmt 0)

10+ Year Member



Hasn't the "Inktomi Slurp" desingation as user agent been deprecated? I see it only once in a long while in my web logs since Yahoo bought Inktomi and is using the "Mozilla/5.0+(compatible;+Yahoo!+Slurp;+http://help.yahoo.com/help/us/ysearch/slurp)" UA.

Am I wrong and the "Inktomi Slurp" bot is still out there? How active is it? Many people seeing it?

I have the same challenge with Yahoo not crawling deep into my site. New site and no penalities. Other Yahoo UA I should be looking for?

lstrand

3:01 pm on Jul 26, 2004 (gmt 0)

10+ Year Member



Rollo, recently MSN bot has been using the UA "msnbot/0.11+(+http://search.msn.com/msnbot.htm)" to crawl my sites.

The odd thing is even though MSN bot is going deeper into the site, the results are not being indexed.

Why is the bot crawling as such and not publishing the results? SEO comments on MSN crawl/index tips?

balam

11:09 pm on Jul 26, 2004 (gmt 0)

10+ Year Member



Regarding a few questions from this thread...

1. > Does anyone have any idea who #2 "crawl" could be?

2. > Anyone know how is the new MSN search identified?

3. > Hasn't the "Inktomi Slurp" desingation as user agent been deprecated?

As I long-time user of AWStats - and one who has nosed through the code - I can tell you that #2 (Unknown robot (identified by 'crawl')) could be any number of spiders not currently identified by the software. This is also the case with "identified by 'spider'" & "by 'robot'."

What is happening here is the software looks at the User Agent and searches for some identifying string in it. If it finds "Googlebot", well then that's a visit from Google, and is identified as such.

In an effort to increase accuracy and not identify unknown robots as human visitors, AWStats also searches for the words "crawl", "spider" & "robot" in the UA (if it hasn't already figured out who the visitor is).

Here's an example...

While I haven't seen it in months, LookSmart has a distributed crawler called "Grub." (See grub.org.) AWStats does not identify visits by Grub as Grub, but it does catch them and includes them under the "identified by 'crawl'" heading. Here's a Grub UA:

Mozilla/4.0 (compatible; grub-client-1.5.3; Crawl your own stuff with http:// grub.org)

(I broke the link in the UA.)

Note the highlighted "Crawl" in the UA.

Quickly, another example...

Baidu's (baidu.com) robot, Baiduspider, is not identified as such, but it is included under the "spider" heading:

Baiduspider+(+http:// www.baidu.com/search/spider.htm)

(I broke the link again...)

This method of trying to identify the known & unknown isn't perfect - I have yet to be visited by the real "IBM_Planetwide" & "EchO!" robots, despite what the stats may say. These human visitors have been misidentified as robots. Check this:

Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; Echo OnLine; .NET CLR 1.0.3705)

"Echo Online" is an ISP in Toronto, Canada.

I just noticed that Rollo listed Baiduspider... We - Rollo & I - are most likely using different versions of AWStats. Yeah, that's my story... :) (I'm looking at v6.0 Build 1.704.)

And that brings me to the second & third question...

If the question is how is the new MSN robot identified by AWStats, the answer is - it's not. Currently, this robot is included in the stats as if it were a human (having a number of implications, such as artificially lowering the average visit duration).

And yes, "Inktomi Slurp" is (now) incorrect. I imagine current or future versions of AWStats will correctly identify the robot as (something like) "Yahoo! Slurp".

(I suppose it's worth noting that just this past Saturday, v6.2(beta) was released. See awstats.sourceforge.net.)

If you run your own copy of AWStats, and know just a bit of perl, you can modify the robots.pm file to identify any robot not currently identified by the software.

And, unrelated to AWStats...

> Why is the bot crawling as such and not publishing the results?

lstrand, I think the results are available... Have you trying searching at techpreview.search.msn.com [techpreview.search.msn.com]? I think I heard that's where the results of MSN's crawling can be found.

wilderness

12:53 am on Jul 27, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I think I heard that's where the results of MSN's crawling can be found.

Here's a link to all the MSN tools, one of which is the link provided by balam

[sandbox.msn.com...]

Don