|Yahoo Slurp struggles to get out of first gear|
*continues to drum fingers on table impatiently*
My site has been online for two years and has had about 3000 pages of content for over six months with plenty of incoming links.
We get >10,000 hits from Googlebot per month, >10,000 hits from Google's Mediapartners bot (Adsense) per month, but only about 500 hits from Yahoo's Slurp bot per month.
Google fully indexed the content within a few weeks of it going live. Yahoo has never had more than 520 pages in its index.
Unsurprisingly, traffic from Yahoo stumbles along at approximately 5% of traffic from Google.
Please tell me what I have to do to get my site crawled properly - six months and still waiting is frankly ridiculous.
Six months? Good God.
I'm having the same problem. I'm doing very well in Google and very poorly in Yahoo/MSN. About 6% of traffic currently for my most important site. All sorts of strangeness. Some pages disappeared. I've found others that I took down over a month ago still highly ranked although they point to nowhere. And just today, the ultimate, Yahoo is ranking highly (for a rather competitive keyword worth a good deal) the links page on a site that has absolutely nothing to do with the topic rather than the optimized target page that the links page points to (a high PR site with lots of links) that has everything to do with the search topic (a page which still isn't even indexed at all).
The only thing that I can think of is that if your pages have a lot of duplicate content, maybe Yahoo doesn't pick them up (although, then again, it seems to pick up the duplicate content pages of my competitors). Too many title keywords seems to be bad… all the pages that I have that do well in Yahoo were inconsequential pages where I just slapped up a 3 or 4 word title and a short description with little keyword concentration. They also seem to be domain name happy weighting domain names heavily. Then again, I really have no idea. In my case it seems, the lamer the site, the better it seems to do in Yahoo. Yahoo seems to be the anti-Google at the moment. All I can say is that I can’t wait for the new MSN so I'm not shut out of both engines.
All Yahoo’d Out,
Any comments from the Yahoo reps here? The site's in my profile. Does Yahoo have the capacity to look at apparent problems related to Slurp spidering specific sites?
We have the exact same problem. Google spiders our site perfectly, thoroughly, and completely. Yahoo--just the home page.
The 12th, 13th and today 14th.....slurp spidered my swedish homepage (not the main one).
This is the only one slurp spidered these days........and it havn´t changed.
Just thinking...doing an search in yahoo for travel gives an result of 209.000.000 pages, doing the same search in google gives 181.000.000....
That must meen yahoo have 28.000.000 indexed and crawled by slurp, the rest is google pages indexed in yahoo without any rankings .....Am I right?
Actually doing site:http://www.mysite.com in yahoo shows 285 pages indexed.
I have indexed in yahoo serp in one database 8 pages in another I have another 15 diferent pages.
They spidered more than 200 pages for weeks ago, and let 1 in every one or second day. Will my hole site be indexed in an year? :)
I think having this google backupfill must take a lot of space.....
Why just don´t they drop google and spider more and let spidered pages in quicker?
For how long does yahoo have contract with google....very strange
I have the same problem. Google is about 50 times more traffic than Yahoo, ever since Yahoo stopped using Google's results. Before that, Google traffic was only 4 times that of Yahoo.
I've got a website with 10,000+ dynamic pages. I converted them to static (htm) pages using mod rewrite, but it didn't seem to help.
I wrote Yahoo to make sure that I wasn't being penalized, and they assured me that I wasn't.
I do have a lot of pages that could be considered duplicates because they come from a database - so a lot of the content is the same (like the page headers/footers, perhaps 90% of the html).
Could it be that Yahoo is just not indexing dynamic websites that pro-actively?
It's interesting to note that Yahoo still has access to Google and will display Google results but only if you have an obscure search term that is, for instance, four words and exactly matches a page title (or something like that). For instance if you do "site: www.domain.org filetype:htm) you will see all of your site indexed on yahoo (whereas a regular "site: www.domain.org" only shows me 800 pages).
Yahoo is driving me crazy - I just don't know what to do. I really don't like it that literally 97% of my search engine traffic is coming from Google or a search engine that shows Google results.
John... Why don't you try contacting Yahoo_Mike? He recently made a post here: [webmasterworld.com...] It seems no one has paid it much attention... maybe someone could sticky mail him about all this weak spidering and general slowness and report back.
Personally, I'm Yahoo'd out... I'm going to ignore Yahoo save a bit more complaining. I do respect the company generally so hopefully they're making improvements. Inktomi just plain sucks. I have one site listed in DMOZ AND Yahoo Dir and still nada.
Persoanlly, I think hard work should be rewarded with higher SERPs. People that put a lot of effort into marketing also put a lot of effort into the content of their websites more often than not. With the new Yahoo, in our case there is a low PR Geocites page with 66 words on it coming up #1 for a very competative keyword combo. These are the superior results I've been reading so much about?
If Yahoo doesn't change, at least we can look forward to its coming divorce with MSN next year. I think if you do well in Google, you'll do very well in the new MSN. [techpreview.search.msn.com...]
It's a little buggy, they have a serious clustering problem, but so far I'm LOVING it. It's almost enough to make one forgive Bill Gates...
My website uses dynamic pages. Back in September 2003, I changed the format of the ID in the URL. I changed it from foopage.php?id=1 to foopage.php?iGid=1
A couple months ago I decided to give Yahoo 404 errors when it got the format wrong (previously it was just getting a duplicate page with next to zero content). I figured it would index my pages correctly over the course of a couple months.
Google had no problem making the adjustment. Google works fine whether I use dynamic or static pages.
So now Yahoo still visits a handful of pages every day or so, collects the 404 error message, and that's it!
I also switched to using static pages (htm - using mod rewrite) but that didn't seem to help. Does anyone have an idea as to whether Yahoo prefers static-looking pages?
A few months ago I implemented modrewrite so dynamic pages look as though they have static url's.
I had been using static pages with iframes and I let those pages go as 404's. Google dropped the 404 pages and included the new ones in short order.
I don't believe it's that yahoo prefers static-looking url's, I don't think it's the spider at all. I've seen plenty of crawling, but 404 pages seem to linger forever and new pages creep in, but don't rank. I'd say that slurp spiders the pages no problem, but the index is long overdue for a major update based on what's been collected.
|Why don't you try contacting Yahoo_Mike? |
Hah! Done that. Contact Tim? Done that. Email webmasterworld Yahoo feedback? Done that. Post problem in 'Questions for Yahoo!' Done that. Post problem in 'Answers from Yahoo!' Done that. Start my own thread? Done that. Contact Yahoo.uk support? Done that. Contact Inktomi? Done that. And never had any advice as to how I can get back into Yahoo serps.
My site is not banned and slurp visits regularly, but the pages it takes cannot be found in serps. All I get is emails from Overture saying how wonderful they are and how I can get traffic from Yahoo.
Silly me... I thought I was doing something innocuous when I paid for an Inktomi PFI page all those years ago...
Yahoo does a terrible job crawling. They want you to pay them for inclusion instead.
ding ding ding, drop them a feed at 15 cents a click, and ur in the top ;) it works well, trust me
slurp gone crazy spidering files that don´t exist and ever existed
as well ads non-existing folders for existing pages.
Gets a lots of 404 and ip is Inktomi.
I'm seeing the same thing as Helen - and this is new. All of a sudden, slurp comes by and requests pages with names that have some partial component the same as one on my site (e.g., "GET /alphaA/pond/freegift.htm", where there is a page "pond.php" on my site, but no folder alphaA, nor a file freegift.htm)
How nice it would be to see Slurp hitting non-existent pages of my site, instead of seeing it hitting my robots.txt only a zillion times from April...:(=
>>>>How nice it would be to see Slurp hitting non-existent pages of my site, instead of seeing it hitting my robots.txt only a zillion times from April...:(= >>>
I know how you feel.
My site was like that since january until a few weeks ago.
Still trying to forget these months.
I actually have the opposite problem LOL
I have great rankings on Yahoo but terrible on Google.
Right now around 70% Yahoo 8% Google
Yahoo sucks at "free" crawling. Its pretty basic, they want your money.
just thought i'd add a quick comment RE: pages that are removed are getting 404 all the time...
seems to me that if a page is removed for good and not replaced in any way, that a 410 response would be the best to return instead of a 404...
|10.4.5 404 Not Found |
The server has not found anything matching the Request-URI. No indication is given of whether the condition is temporary or permanent. The 410 (Gone) status code SHOULD be used if the server knows, through some internally configurable mechanism, that an old resource is permanently unavailable and has no forwarding address. This status code is commonly used when the server does not wish to reveal exactly why the request has been refused, or when no other response is applicable.
10.4.11 410 Gone
The requested resource is no longer available at the server and no forwarding address is known. This condition is expected to be considered permanent. Clients with link editing capabilities SHOULD delete references to the Request-URI after user approval. If the server does not know, or has no facility to determine, whether or not the condition is permanent, the status code 404 (Not Found) SHOULD be used instead. This response is cacheable unless indicated otherwise.
The 410 response is primarily intended to assist the task of web maintenance by notifying the recipient that the resource is intentionally unavailable and that the server owners desire that remote links to that resource be removed. Such an event is common for limited-time, promotional services and for resources belonging to individuals no longer working at the server's site. It is not necessary to mark all permanently unavailable resources as "gone" or to keep the mark for any length of time -- that is left to the discretion of the server owner.
so, if you have taken down a page and it will never return or be replaced/redirected, t'would seem to be 410 time ;)