homepage Welcome to WebmasterWorld Guest from 50.19.74.67
register, free tools, login, search, subscribe, help, library, announcements, recent posts, open posts,
Subscribe to WebmasterWorld
Visit PubCon.com
Home / Forums Index / Yahoo / Yahoo Search Engine and Directory
Forum Library, Charter, Moderators: martinibuster

Yahoo Search Engine and Directory Forum

    
Yahoo Renames Spider
Brett_Tabke




msg:837806
 12:41 am on Feb 17, 2004 (gmt 0)

Yahoo! will be modifying its web-indexing robot (crawler) user-agent to reflect our company name. This is the crawler we use to build a searchable index for the Yahoo! search services. To ensure consistency and minimal disruption, we will continue to maintain the "Slurp" name within our web crawler user agent and continue to support "Slurp" as part of any robots.txt files that references this.

Yahoo! Slurp will continue to obey the Robot Exclusion Standard (http://www.robotstxt.org/wc/exclusion.html).

The complete User-Agent that will be logged as part of this transition will be shown in your logs (if user-agent logging is enabled) as follows:

Mozilla/5.0 (compatible; Yahoo! Slurp; [help.yahoo.com...] )

For more information on the Yahoo! Slurp user-agent, robots.txt usage or accesses please refer to the slurp information page now located at:

[help.yahoo.com...]

This transition will take affect starting Monday Feb 16th, 2004.


 

pendanticist




msg:837807
 12:43 am on Feb 17, 2004 (gmt 0)

[webmasterworld.com...]

4serendipity




msg:837808
 3:06 am on Feb 17, 2004 (gmt 0)

Yahlurp lives :)

I spotted Yahlurp in a couple logs this morning.

Brett_Tabke




msg:837809
 3:14 am on Feb 17, 2004 (gmt 0)

That is straight from Yahoo (thanks Tim [webmasterworld.com])

markis00




msg:837810
 3:41 am on Feb 17, 2004 (gmt 0)

Let's hope they increase the speed of the spider!

4serendipity




msg:837811
 3:44 am on Feb 17, 2004 (gmt 0)

To verify the UA mentioned in the first post was exactly what I saw in the logs

Mozilla/5.0 (compatible; Yahoo! Slurp; [help.yahoo.com...]

sidyadav




msg:837812
 5:15 am on Feb 17, 2004 (gmt 0)

cool :)

Now we have Yahoo's own noisly eating Web Spider :)

(Look at definition #4 of slurp [google.com] at google.)

Sid

4serendipity




msg:837813
 7:59 am on Feb 17, 2004 (gmt 0)

Now we have Yahoo's own noisly eating Web Spider :)

(Look at definition #4 of slurp at google.)

Yeah, it always gives me a kick how web-centric the Google define: function is. You have to scroll waaay down the list for define:yahoo to find the traditional meaning of the term.

Kirby




msg:837814
 5:01 pm on Feb 17, 2004 (gmt 0)

Anyone notice that Yahoo is archiving as well?

[help.yahoo.com...]

GlynMusica




msg:837815
 9:41 am on Feb 18, 2004 (gmt 0)

What I am seeing is that sites that were kicked out of Google following the recent filter hoohah, are included in the Yahoo listings.

Which is nice. For a start.

Wail




msg:837816
 3:31 pm on Feb 18, 2004 (gmt 0)

I think this was a brave move by Yahoo!
I think they did the right thing.

I agree with their reason for keeping the Slurp name. Backwards compatibility, it's everyone's friend.

Why's it brave? Slurp was hardly a prestigious spider. I think it was pretty dumb and fairly cowardly; especially for query strings. I'm shooting from the hip but I suspect the most intelligent spider Yahoo! bought on their search-engine-fest was FAST. I'd see Fast Yahoo! as a more serious threat to Googlebot, but although I shouldn't judge the new spider by its name, I still snigger when I see Slurp's user agent.

Yahoo Shopper, of course, has been bimbling around robotland for a while now.

4serendipity




msg:837817
 9:54 pm on Feb 18, 2004 (gmt 0)

Anyone notice that Yahoo is archiving as well?

Yes, and after a quick review of some large pages, it looks like it is caching pages over 101KB in size.

Tim




msg:837818
 10:09 pm on Feb 18, 2004 (gmt 0)

we index 500K for html- larger than Googles 101k. This is published in Searchday.

Powdork




msg:837819
 4:06 am on Feb 19, 2004 (gmt 0)

Can we call you TimothY!?

newsphinx




msg:837820
 8:03 am on Feb 19, 2004 (gmt 0)

Spot +Yahoo!+Slurp in my logs. However, it does not appear in my Sawmill report.

stripey




msg:837821
 1:09 pm on Feb 19, 2004 (gmt 0)

Slurpoo! (heheheh) ;-)

farside847




msg:837822
 9:03 pm on Feb 19, 2004 (gmt 0)

The new Y! spider is hitting files in banned by my robots.txt file...

Anyone else seeing this?

misja




msg:837823
 10:15 pm on Feb 19, 2004 (gmt 0)

Well as far as I can see it's obeying robots.txt but I think Yahoo is rewriting some of my URLs like:
site.com/directory/?search=test
to:
site.com/directory?search=test

While my robots.txt says
Disallow: /directory/

4serendipity




msg:837824
 2:18 am on Feb 20, 2004 (gmt 0)

Tim, thanks for the heads up about the searchday article

a_chameleon




msg:837825
 9:05 pm on Feb 20, 2004 (gmt 0)

I'm seeing a bot in my log files:

(Yahoo-MMCrawler/3.x (mm dash crawler at trd dot overture dot com)

that resolves to mmscrm03-1.sac.overture.com and the whois says overture.com;

but I also get -

Network Data
Network id#: 1
Qwest Cybercenters QWEST-CYBERCENTER-2 (NET-66-77-0-0-1)
66.77.0.0 - 66.77.255.255
Fast Search, Inc. QWEST-MCC-FASTSRCH3 (NET-66-77-73-0-1)
66.77.73.0 - 66.77.73.255

ARIN WHOIS database, last updated 2004-02-19 19:15

is this a new crawler that's using a FAST/Overture combo of some sort..?

Tim




msg:837826
 9:13 pm on Feb 20, 2004 (gmt 0)

Multimedia crawler for Alltheweb, AV and other Overture partners Index.

bonanza




msg:837827
 12:41 pm on Feb 21, 2004 (gmt 0)

The slurp spider is checking robots.txt several times a day almost hourly, but nothing else, for weeks now. I find myself reading that behavior like tea leaves, and I don't like it. The site well established but absent from the Y! serps.

On the other hand, the Yahoo Seeker spider is spidering deeply.

*shrug*

cityres




msg:837828
 10:07 pm on Feb 22, 2004 (gmt 0)

So what is Yahoo using currently? Inktomi or is it their own engine now? How does it work? Any tips on optimization?

Wail




msg:837829
 9:59 am on Feb 23, 2004 (gmt 0)

Multimedia crawler for Alltheweb, AV and other Overture partners Index.

Hi Tim, it's great to have you on the forums.

Is this a fair assessment?
+ Scooter still finds results for AV and AV is still a stand alone search engine in its own right.
+ FAST still finds results for AtW and AtW is a still a stand alone search engine in its own right.

However, new technologies developed by Yahoo! are likely to be single items shared by all the search engines owned by the company? Yahoo-MMCrawler is an example of this?

I think the new Yahoo results look really good. The web page summaries are especially fair.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Yahoo / Yahoo Search Engine and Directory
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About
© Webmaster World 1996-2014 all rights reserved