| 12:43 am on Feb 17, 2004 (gmt 0)|
| 3:06 am on Feb 17, 2004 (gmt 0)|
Yahlurp lives :)
I spotted Yahlurp in a couple logs this morning.
| 3:14 am on Feb 17, 2004 (gmt 0)|
That is straight from Yahoo (thanks Tim [webmasterworld.com])
| 3:41 am on Feb 17, 2004 (gmt 0)|
Let's hope they increase the speed of the spider!
| 3:44 am on Feb 17, 2004 (gmt 0)|
To verify the UA mentioned in the first post was exactly what I saw in the logs
Mozilla/5.0 (compatible; Yahoo! Slurp; [help.yahoo.com...]
| 5:15 am on Feb 17, 2004 (gmt 0)|
Now we have Yahoo's own noisly eating Web Spider :)
(Look at definition #4 of slurp [google.com] at google.)
| 7:59 am on Feb 17, 2004 (gmt 0)|
|Now we have Yahoo's own noisly eating Web Spider :) |
(Look at definition #4 of slurp at google.)
Yeah, it always gives me a kick how web-centric the Google define: function is. You have to scroll waaay down the list for define:yahoo to find the traditional meaning of the term.
| 5:01 pm on Feb 17, 2004 (gmt 0)|
Anyone notice that Yahoo is archiving as well?
| 9:41 am on Feb 18, 2004 (gmt 0)|
What I am seeing is that sites that were kicked out of Google following the recent filter hoohah, are included in the Yahoo listings.
Which is nice. For a start.
| 3:31 pm on Feb 18, 2004 (gmt 0)|
I think this was a brave move by Yahoo!
I think they did the right thing.
I agree with their reason for keeping the Slurp name. Backwards compatibility, it's everyone's friend.
Why's it brave? Slurp was hardly a prestigious spider. I think it was pretty dumb and fairly cowardly; especially for query strings. I'm shooting from the hip but I suspect the most intelligent spider Yahoo! bought on their search-engine-fest was FAST. I'd see Fast Yahoo! as a more serious threat to Googlebot, but although I shouldn't judge the new spider by its name, I still snigger when I see Slurp's user agent.
Yahoo Shopper, of course, has been bimbling around robotland for a while now.
| 9:54 pm on Feb 18, 2004 (gmt 0)|
|Anyone notice that Yahoo is archiving as well? |
Yes, and after a quick review of some large pages, it looks like it is caching pages over 101KB in size.
| 10:09 pm on Feb 18, 2004 (gmt 0)|
we index 500K for html- larger than Googles 101k. This is published in Searchday.
| 4:06 am on Feb 19, 2004 (gmt 0)|
Can we call you TimothY!?
| 8:03 am on Feb 19, 2004 (gmt 0)|
Spot +Yahoo!+Slurp in my logs. However, it does not appear in my Sawmill report.
| 1:09 pm on Feb 19, 2004 (gmt 0)|
Slurpoo! (heheheh) ;-)
| 9:03 pm on Feb 19, 2004 (gmt 0)|
The new Y! spider is hitting files in banned by my robots.txt file...
Anyone else seeing this?
| 10:15 pm on Feb 19, 2004 (gmt 0)|
Well as far as I can see it's obeying robots.txt but I think Yahoo is rewriting some of my URLs like:
While my robots.txt says
| 2:18 am on Feb 20, 2004 (gmt 0)|
Tim, thanks for the heads up about the searchday article
| 9:05 pm on Feb 20, 2004 (gmt 0)|
I'm seeing a bot in my log files:
|(Yahoo-MMCrawler/3.x (mm dash crawler at trd dot overture dot com) |
that resolves to mmscrm03-1.sac.overture.com and the whois says overture.com;
but I also get -
|Network Data |
Network id#: 1
Qwest Cybercenters QWEST-CYBERCENTER-2 (NET-66-77-0-0-1)
188.8.131.52 - 184.108.40.206
Fast Search, Inc. QWEST-MCC-FASTSRCH3 (NET-66-77-73-0-1)
220.127.116.11 - 18.104.22.168
ARIN WHOIS database, last updated 2004-02-19 19:15
is this a new crawler that's using a FAST/Overture combo of some sort..?
| 9:13 pm on Feb 20, 2004 (gmt 0)|
Multimedia crawler for Alltheweb, AV and other Overture partners Index.
| 12:41 pm on Feb 21, 2004 (gmt 0)|
The slurp spider is checking robots.txt several times a day almost hourly, but nothing else, for weeks now. I find myself reading that behavior like tea leaves, and I don't like it. The site well established but absent from the Y! serps.
On the other hand, the Yahoo Seeker spider is spidering deeply.
| 10:07 pm on Feb 22, 2004 (gmt 0)|
So what is Yahoo using currently? Inktomi or is it their own engine now? How does it work? Any tips on optimization?
| 9:59 am on Feb 23, 2004 (gmt 0)|
|Multimedia crawler for Alltheweb, AV and other Overture partners Index. |
Hi Tim, it's great to have you on the forums.
Is this a fair assessment?
+ Scooter still finds results for AV and AV is still a stand alone search engine in its own right.
+ FAST still finds results for AtW and AtW is a still a stand alone search engine in its own right.
However, new technologies developed by Yahoo! are likely to be single items shared by all the search engines owned by the company? Yahoo-MMCrawler is an example of this?
I think the new Yahoo results look really good. The web page summaries are especially fair.