Yahlurp lives :)
I spotted Yahlurp in a couple logs this morning.
That is straight from Yahoo (thanks Tim [webmasterworld.com])
Let's hope they increase the speed of the spider!
To verify the UA mentioned in the first post was exactly what I saw in the logs
Mozilla/5.0 (compatible; Yahoo! Slurp; [help.yahoo.com...]
Now we have Yahoo's own noisly eating Web Spider :)
(Look at definition #4 of slurp [google.com] at google.)
|Now we have Yahoo's own noisly eating Web Spider :) |
(Look at definition #4 of slurp at google.)
Yeah, it always gives me a kick how web-centric the Google define: function is. You have to scroll waaay down the list for define:yahoo to find the traditional meaning of the term.
Anyone notice that Yahoo is archiving as well?
What I am seeing is that sites that were kicked out of Google following the recent filter hoohah, are included in the Yahoo listings.
Which is nice. For a start.
I think this was a brave move by Yahoo!
I think they did the right thing.
I agree with their reason for keeping the Slurp name. Backwards compatibility, it's everyone's friend.
Why's it brave? Slurp was hardly a prestigious spider. I think it was pretty dumb and fairly cowardly; especially for query strings. I'm shooting from the hip but I suspect the most intelligent spider Yahoo! bought on their search-engine-fest was FAST. I'd see Fast Yahoo! as a more serious threat to Googlebot, but although I shouldn't judge the new spider by its name, I still snigger when I see Slurp's user agent.
Yahoo Shopper, of course, has been bimbling around robotland for a while now.
|Anyone notice that Yahoo is archiving as well? |
Yes, and after a quick review of some large pages, it looks like it is caching pages over 101KB in size.
we index 500K for html- larger than Googles 101k. This is published in Searchday.
Can we call you TimothY!?
Spot +Yahoo!+Slurp in my logs. However, it does not appear in my Sawmill report.
Slurpoo! (heheheh) ;-)
The new Y! spider is hitting files in banned by my robots.txt file...
Anyone else seeing this?
Well as far as I can see it's obeying robots.txt but I think Yahoo is rewriting some of my URLs like:
While my robots.txt says
Tim, thanks for the heads up about the searchday article
I'm seeing a bot in my log files:
|(Yahoo-MMCrawler/3.x (mm dash crawler at trd dot overture dot com) |
that resolves to mmscrm03-1.sac.overture.com and the whois says overture.com;
but I also get -
|Network Data |
Network id#: 1
Qwest Cybercenters QWEST-CYBERCENTER-2 (NET-66-77-0-0-1)
188.8.131.52 - 184.108.40.206
Fast Search, Inc. QWEST-MCC-FASTSRCH3 (NET-66-77-73-0-1)
220.127.116.11 - 18.104.22.168
ARIN WHOIS database, last updated 2004-02-19 19:15
is this a new crawler that's using a FAST/Overture combo of some sort..?
Multimedia crawler for Alltheweb, AV and other Overture partners Index.
The slurp spider is checking robots.txt several times a day almost hourly, but nothing else, for weeks now. I find myself reading that behavior like tea leaves, and I don't like it. The site well established but absent from the Y! serps.
On the other hand, the Yahoo Seeker spider is spidering deeply.
So what is Yahoo using currently? Inktomi or is it their own engine now? How does it work? Any tips on optimization?
|Multimedia crawler for Alltheweb, AV and other Overture partners Index. |
Hi Tim, it's great to have you on the forums.
Is this a fair assessment?
+ Scooter still finds results for AV and AV is still a stand alone search engine in its own right.
+ FAST still finds results for AtW and AtW is a still a stand alone search engine in its own right.
However, new technologies developed by Yahoo! are likely to be single items shared by all the search engines owned by the company? Yahoo-MMCrawler is an example of this?
I think the new Yahoo results look really good. The web page summaries are especially fair.