homepage Welcome to WebmasterWorld Guest from
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Yahoo / Yahoo Search Engine and Directory
Forum Library, Charter, Moderators: martinibuster

Yahoo Search Engine and Directory Forum

Strange spider behaviour ..
Is anyone else seeing this?

 1:45 pm on Feb 21, 2004 (gmt 0) - - [20/Feb/2004:14:56:28 -0600] "GET /robots.txt HTTP/1.0" 200 825 "-" "Mozilla/5.0 (compatible; Yahoo! Slurp; [help.yahoo.com...] - - [20/Feb/2004:16:50:26 -0600] "GET /robots.txt HTTP/1.0" 200 825 "-" "Mozilla/5.0 (compatible; Yahoo! Slurp; [help.yahoo.com...] - - [20/Feb/2004:18:56:21 -0600] "GET /robots.txt HTTP/1.0" 200 825 "-" "Mozilla/5.0 (compatible; Yahoo! Slurp; [help.yahoo.com...] - - [20/Feb/2004:20:28:43 -0600] "GET /robots.txt HTTP/1.0" 200 825 "-" "Mozilla/5.0 (compatible; Yahoo! Slurp; [help.yahoo.com...]

This pattern's been repeating for the last 48 hours. Any ideas what the deal could be? Fairly clean robots.txt and never had a problem with the old ink or google spiders.



 2:49 pm on Feb 21, 2004 (gmt 0)

Perhaps just to get your attention and make you wonder why you are not in the index or soon to be drop from the index and it's time for you to consider PFI? lol


 4:27 pm on Feb 21, 2004 (gmt 0)

Please send examples of your log files and your domains via stickymail and I will take care of it.


 6:21 pm on Feb 21, 2004 (gmt 0)


Done. Appreciate the help. :)


 7:27 pm on Feb 21, 2004 (gmt 0)

"Is anyone else seeing this?"
Kinda... - - [21/Feb/2004:08:54:52 -0500] "GET /robots.txt HTTP/1.0" 200 939 "-" "Mozilla/5.0 (Slurp/si; slurp@inktomi.com; [inktomi.com...] - - [21/Feb/2004:08:54:53 -0500] "GET / HTTP/1.0" 200 67542 "-" "Mozilla/5.0 (Slurp/cat; slurp@inktomi.com; [inktomi.com...]

No patern though, didn't visit me for 2 days and now this. It was doing it almost daily before.


 8:57 pm on Feb 21, 2004 (gmt 0)

>Please send examples of your log files and your domains via stickymail and I will take care of >it.

Does that include anyone who may read this thread with similar problems?


 9:44 pm on Feb 21, 2004 (gmt 0)

There used to be an email for slurp support on the site that you could directly send problems to. I will find out what this is so I dont become an intermediary. Until Monday please stickymail me then I will post a new contact email.


 3:54 pm on Feb 23, 2004 (gmt 0)

Hey Tim,

Any ETA on the address? I've got a few private messages asking me if there was a magical fix. Would rather send them to the proper addy.


 4:36 pm on Feb 23, 2004 (gmt 0)

It is webmasterworldfeedback@yahoo.com


 1:53 am on Feb 24, 2004 (gmt 0)

Thanks, I resubmitted by problem there to make sure it gets on the queue. Owe you a drink if you're at PubConf.


 2:05 am on Feb 24, 2004 (gmt 0)

Hi Tim. I hope you are going to be able to really help people. I sent a message about a week ago to the INK address listed on their site and got a form mail reply about what gets people booted from the index. I'm pretty sure the reason many people are not in the index is because we once paid for spidering and now don't. (BTW, the reason isn't listed on the form mail sent out) It's strange that only the sites that once paid and now don't are gone and other sites we have that are in the same format are still included but never were in the PPI program. You seem like a really nice person Tim, but the BS form mail doesn't really help people and I hope you will be able to give us all clear answers instead of maybe this or maybe that kind of things. Thanks for listening.


 5:37 pm on Feb 27, 2004 (gmt 0)

I can't believe that I'm still seeing requests for files that have been 404'd for what is now well over a year!

Month after month it came back for the old, dead files.

Posts and e-mails pondering why a bot should be so stupid came and went without solutions.

Line after line of convoluted access_log files containing the same redundant requests.

And now?

And now the cycle begins anew........

66.***.**.40 - - [26/Feb/2004:21:21:04 -0800] "GET /Blah-ishblah.html HTTP/1.0" 404 2847 "-" "Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)"

Ain't progress a wonderful thing?


 6:28 pm on Feb 27, 2004 (gmt 0)

My hope is that Yahoo will combine the results of Alltheweb, AltaVista and Inktomi. AltaVista and Alltheweb have more current data in my view because they seems to crawl a site quicker and more efficiently.


 7:51 pm on Feb 27, 2004 (gmt 0)

I don't know whether they will be combining the indexes and crawler capabilities of all their owned search engines. But they intend to offer one Paid Inclusion Product to have you added to all three:

"As of the 1st of March 2004, we will no longer be accepting URLs for inclusion via:

- Inktomi Search Submit
- AltaVista Express Submit
- Fast PartnerSite

Pricing and other details of new and exciting programs to replace the above services will soon be provided. The new products will include all previously supported engines and more."

They gave no indication that they were going to do this (and they haven't commented on the above mailing) so your guess is as good as mine as to what they intend to do.


 8:04 pm on Feb 27, 2004 (gmt 0)

Let's keep this thread on-topic, please. The original post defines this thread's subject as problems with the Yahoo-branded Slurp spider. Comments about PFI and other subjects belong in a new or separate thread.



 9:25 pm on Feb 27, 2004 (gmt 0)

Okay, to justify my last message, the implication was that Yahoo is slowing down all the free crawling activity by its respective crawlers (Slurp, FAST, Scooter) in preparation for a new Paid Inclusion project. This is why you may see strange bot hits to your server.

Sorry if that did not seem clear from my original message above jdMorgan.

My opinion is that Yahoo intends to replace the crawlers/bots due to a merger of its search technology and that Slurp is still allowed to hit sites, but is not currently indexing them. Tim (Yahooguy in the forum) said that Slurp was being replaced with YahooSlurp did he not? I take this to mean not just a simple renaming, but a completely new crawler.


 9:49 pm on Feb 27, 2004 (gmt 0)

Hi Mark Hutch,
Yes I sent your request on to support. They handled it via a standard response. That group generally deals with reponses via formatted responses.


 11:02 am on Feb 28, 2004 (gmt 0)

Ah, Timmy, Timmy. Any chance you're going to tell us what's in the future for the crawlers and the paid inclusion programs and how they will interlink?

Yahoo were very upfront and open about the fact they intended to use Inktomi in the Yahoo Search Engine (which of course got thousands of more PFI Inktomi sales) only to turn around and say - "Well, we didn't mean THAT part of Inktomi!"

Why the secrecy? Why the helpfulness to individual matters (which, by the way, is working a treat in the PR department) but no actual information to help us prepare for the new product.
Keeping quiet about Yahoo crawler's current activities and not even mentioning the scrapping of three PFI programs to mould a new single program makes Yahoo look a little flaky. People paid out shed-loads for Directory Inclusion and Inktomi Inclusion and now curse their mistake. With all these secret changes do you really think people are going to keep throwing their money Yahoo's way?

.... yeah, you're right, they probably will ...


 2:38 pm on Feb 28, 2004 (gmt 0)

How about this UA? Just got this lately going nuts in one of my site.

Yahoo-VerticalCrawler-FormerWebCrawler/3.9 crawler at trd dot overture dot com; [alltheweb.com...]


 4:22 pm on Feb 28, 2004 (gmt 0)

I have a site with exactly the same situation. Slurp repeatedly grabs robots.txt, but otherwise won't touch the site.

The site is in Y!s index, but isn't refreshed and gets very little traffic. The pages present in the index seem to be age-old.

The site is on a sub-domain of a .com domain, like widgets.example.com.


 9:26 am on Mar 3, 2004 (gmt 0)

Ok. Tim mentioned this very specific situation at PubCon as an indication that there is a penalty on your site.

Unfortunately, he left a day earlier than I had expected and I never did get the opportunity to talk to him.

-- Is their engine sophisticated enough to figure out that layers are used in menus. So hidden div's don't automagically mean hidden text.

-- Are affiliate links likely to get you banned? Amazon Buybox? Ebay Feed? CJ Links?

-- Is Y!Slurp just plain old broken (I have indications it is just Slurp with a new refferal agent, as I see some *old* lame url's with session IDs being slurped every few days - despite no inbound links to those IDs)?


 9:34 am on Mar 3, 2004 (gmt 0)

That begs the question, what have I done to deserve any such penalty? The site is quite clean.


 9:36 am on Mar 3, 2004 (gmt 0)

There's a discussion over here that might explain it:

But that just leaves another questions: how do I get out of that trap?

Global Options:
 top home search open messages active posts  

Home / Forums Index / Yahoo / Yahoo Search Engine and Directory
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved