Forum Moderators: open
The reason I ask is that despite a recent listing in Yahoo directory, other than my index page, Yahoo hasn't indexed any of my subpages. I was wondering how long this could take and if "crawl" could be Yahoo.
If it is Yahoo, why are my subpages not being indexed if it is visiting my site?
Any wisdom would be greatly appreciated...
Googlebot (Google)
Unknown robot (identified by 'crawl')
Jeeves
LinkChecker
Inktomi Slurp
MSIECrawler
Road Runner: The ImageScape Robot
Alexa (IA Archiver)
Bumblebee (relevare.com)
BaiDuSpider
Unknown robot (identified by 'spider')
IBM_Planetwide
LinkWalker
larbin
Thanks!
Rollo
Looks like you are use awstats? I use it too, but did not think much about crawl till your post...
tail -100000 access_log ¦ grep "crawl"
comes out with this:
200 19369 "-" "Yahoo-MMCrawler/3.x (mms dash mmcrawler dash support at yahoo dash inc dot com)"
Looks like some MultiMediaSearch or Crawler from Yahoo. On my page it looked for pictures.
So thanks for bring this up to think about!
xcomm
BTW: Does someone know were at Yahoo the MMSearch is located (sorry I'M to much a Googler :-)?
Am I wrong and the "Inktomi Slurp" bot is still out there? How active is it? Many people seeing it?
I have the same challenge with Yahoo not crawling deep into my site. New site and no penalities. Other Yahoo UA I should be looking for?
The odd thing is even though MSN bot is going deeper into the site, the results are not being indexed.
Why is the bot crawling as such and not publishing the results? SEO comments on MSN crawl/index tips?
1. > Does anyone have any idea who #2 "crawl" could be?
2. > Anyone know how is the new MSN search identified?
3. > Hasn't the "Inktomi Slurp" desingation as user agent been deprecated?
As I long-time user of AWStats - and one who has nosed through the code - I can tell you that #2 (Unknown robot (identified by 'crawl')) could be any number of spiders not currently identified by the software. This is also the case with "identified by 'spider'" & "by 'robot'."
What is happening here is the software looks at the User Agent and searches for some identifying string in it. If it finds "Googlebot", well then that's a visit from Google, and is identified as such.
In an effort to increase accuracy and not identify unknown robots as human visitors, AWStats also searches for the words "crawl", "spider" & "robot" in the UA (if it hasn't already figured out who the visitor is).
Here's an example...
While I haven't seen it in months, LookSmart has a distributed crawler called "Grub." (See grub.org.) AWStats does not identify visits by Grub as Grub, but it does catch them and includes them under the "identified by 'crawl'" heading. Here's a Grub UA:
Mozilla/4.0 (compatible; grub-client-1.5.3; Crawl your own stuff with http:// grub.org)
(I broke the link in the UA.)
Note the highlighted "Crawl" in the UA.
Quickly, another example...
Baidu's (baidu.com) robot, Baiduspider, is not identified as such, but it is included under the "spider" heading:
Baiduspider+(+http:// www.baidu.com/search/spider.htm)
(I broke the link again...)
This method of trying to identify the known & unknown isn't perfect - I have yet to be visited by the real "IBM_Planetwide" & "EchO!" robots, despite what the stats may say. These human visitors have been misidentified as robots. Check this:
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; Echo OnLine; .NET CLR 1.0.3705)
"Echo Online" is an ISP in Toronto, Canada.
I just noticed that Rollo listed Baiduspider... We - Rollo & I - are most likely using different versions of AWStats. Yeah, that's my story... :) (I'm looking at v6.0 Build 1.704.)
And that brings me to the second & third question...
If the question is how is the new MSN robot identified by AWStats, the answer is - it's not. Currently, this robot is included in the stats as if it were a human (having a number of implications, such as artificially lowering the average visit duration).
And yes, "Inktomi Slurp" is (now) incorrect. I imagine current or future versions of AWStats will correctly identify the robot as (something like) "Yahoo! Slurp".
(I suppose it's worth noting that just this past Saturday, v6.2(beta) was released. See awstats.sourceforge.net.)
If you run your own copy of AWStats, and know just a bit of perl, you can modify the robots.pm file to identify any robot not currently identified by the software.
And, unrelated to AWStats...
> Why is the bot crawling as such and not publishing the results?
lstrand, I think the results are available... Have you trying searching at techpreview.search.msn.com [techpreview.search.msn.com]? I think I heard that's where the results of MSN's crawling can be found.
I think I heard that's where the results of MSN's crawling can be found.
Here's a link to all the MSN tools, one of which is the link provided by balam
[sandbox.msn.com...]
Don