Welcome to WebmasterWorld Guest from 54.205.209.95

Forum Moderators: martinibuster

Message Too Old, No Replies

Yahoo Renames Spider

   
12:41 am on Feb 17, 2004 (gmt 0)

WebmasterWorld Administrator brett_tabke is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



Yahoo! will be modifying its web-indexing robot (crawler) user-agent to reflect our company name. This is the crawler we use to build a searchable index for the Yahoo! search services. To ensure consistency and minimal disruption, we will continue to maintain the "Slurp" name within our web crawler user agent and continue to support "Slurp" as part of any robots.txt files that references this.

Yahoo! Slurp will continue to obey the Robot Exclusion Standard (http://www.robotstxt.org/wc/exclusion.html).

The complete User-Agent that will be logged as part of this transition will be shown in your logs (if user-agent logging is enabled) as follows:

Mozilla/5.0 (compatible; Yahoo! Slurp; [help.yahoo.com...] )

For more information on the Yahoo! Slurp user-agent, robots.txt usage or accesses please refer to the slurp information page now located at:

[help.yahoo.com...]

This transition will take affect starting Monday Feb 16th, 2004.

12:43 am on Feb 17, 2004 (gmt 0)
3:06 am on Feb 17, 2004 (gmt 0)

10+ Year Member



Yahlurp lives :)

I spotted Yahlurp in a couple logs this morning.

3:14 am on Feb 17, 2004 (gmt 0)

WebmasterWorld Administrator brett_tabke is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



That is straight from Yahoo (thanks Tim [webmasterworld.com])
3:41 am on Feb 17, 2004 (gmt 0)

10+ Year Member



Let's hope they increase the speed of the spider!
3:44 am on Feb 17, 2004 (gmt 0)

10+ Year Member



To verify the UA mentioned in the first post was exactly what I saw in the logs

Mozilla/5.0 (compatible; Yahoo! Slurp; [help.yahoo.com...]

5:15 am on Feb 17, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



cool :)

Now we have Yahoo's own noisly eating Web Spider :)

(Look at definition #4 of slurp [google.com] at google.)

Sid

7:59 am on Feb 17, 2004 (gmt 0)

10+ Year Member



Now we have Yahoo's own noisly eating Web Spider :)

(Look at definition #4 of slurp at google.)

Yeah, it always gives me a kick how web-centric the Google define: function is. You have to scroll waaay down the list for define:yahoo to find the traditional meaning of the term.

5:01 pm on Feb 17, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Anyone notice that Yahoo is archiving as well?

[help.yahoo.com...]

9:41 am on Feb 18, 2004 (gmt 0)

10+ Year Member



What I am seeing is that sites that were kicked out of Google following the recent filter hoohah, are included in the Yahoo listings.

Which is nice. For a start.

3:31 pm on Feb 18, 2004 (gmt 0)

10+ Year Member



I think this was a brave move by Yahoo!
I think they did the right thing.

I agree with their reason for keeping the Slurp name. Backwards compatibility, it's everyone's friend.

Why's it brave? Slurp was hardly a prestigious spider. I think it was pretty dumb and fairly cowardly; especially for query strings. I'm shooting from the hip but I suspect the most intelligent spider Yahoo! bought on their search-engine-fest was FAST. I'd see Fast Yahoo! as a more serious threat to Googlebot, but although I shouldn't judge the new spider by its name, I still snigger when I see Slurp's user agent.

Yahoo Shopper, of course, has been bimbling around robotland for a while now.

9:54 pm on Feb 18, 2004 (gmt 0)

10+ Year Member



Anyone notice that Yahoo is archiving as well?

Yes, and after a quick review of some large pages, it looks like it is caching pages over 101KB in size.

Tim

10:09 pm on Feb 18, 2004 (gmt 0)

10+ Year Member



we index 500K for html- larger than Googles 101k. This is published in Searchday.
4:06 am on Feb 19, 2004 (gmt 0)

WebmasterWorld Senior Member powdork is a WebmasterWorld Top Contributor of All Time 10+ Year Member



Can we call you TimothY!?
8:03 am on Feb 19, 2004 (gmt 0)

10+ Year Member



Spot +Yahoo!+Slurp in my logs. However, it does not appear in my Sawmill report.
1:09 pm on Feb 19, 2004 (gmt 0)

10+ Year Member



Slurpoo! (heheheh) ;-)
9:03 pm on Feb 19, 2004 (gmt 0)

10+ Year Member



The new Y! spider is hitting files in banned by my robots.txt file...

Anyone else seeing this?

10:15 pm on Feb 19, 2004 (gmt 0)

10+ Year Member



Well as far as I can see it's obeying robots.txt but I think Yahoo is rewriting some of my URLs like:
site.com/directory/?search=test
to:
site.com/directory?search=test

While my robots.txt says

Disallow: /directory/
2:18 am on Feb 20, 2004 (gmt 0)

10+ Year Member



Tim, thanks for the heads up about the searchday article
9:05 pm on Feb 20, 2004 (gmt 0)

10+ Year Member



I'm seeing a bot in my log files:

(Yahoo-MMCrawler/3.x (mm dash crawler at trd dot overture dot com)

that resolves to mmscrm03-1.sac.overture.com and the whois says overture.com;

but I also get -

Network Data
Network id#: 1
Qwest Cybercenters QWEST-CYBERCENTER-2 (NET-66-77-0-0-1)
66.77.0.0 - 66.77.255.255
Fast Search, Inc. QWEST-MCC-FASTSRCH3 (NET-66-77-73-0-1)
66.77.73.0 - 66.77.73.255

ARIN WHOIS database, last updated 2004-02-19 19:15

is this a new crawler that's using a FAST/Overture combo of some sort..?

Tim

9:13 pm on Feb 20, 2004 (gmt 0)

10+ Year Member



Multimedia crawler for Alltheweb, AV and other Overture partners Index.
12:41 pm on Feb 21, 2004 (gmt 0)



The slurp spider is checking robots.txt several times a day almost hourly, but nothing else, for weeks now. I find myself reading that behavior like tea leaves, and I don't like it. The site well established but absent from the Y! serps.

On the other hand, the Yahoo Seeker spider is spidering deeply.

*shrug*

10:07 pm on Feb 22, 2004 (gmt 0)

10+ Year Member



So what is Yahoo using currently? Inktomi or is it their own engine now? How does it work? Any tips on optimization?
9:59 am on Feb 23, 2004 (gmt 0)

10+ Year Member



Multimedia crawler for Alltheweb, AV and other Overture partners Index.

Hi Tim, it's great to have you on the forums.

Is this a fair assessment?
+ Scooter still finds results for AV and AV is still a stand alone search engine in its own right.
+ FAST still finds results for AtW and AtW is a still a stand alone search engine in its own right.

However, new technologies developed by Yahoo! are likely to be single items shared by all the search engines owned by the company? Yahoo-MMCrawler is an example of this?

I think the new Yahoo results look really good. The web page summaries are especially fair.