Forum Moderators: open
So we had a good look at how our main e-commerce site is currently doing in current (new) Y!.
The site has a couple thousand pages. All of the pages are very easy to find (lots of backlinks) and easy to crawl (no funny characters in the URL strings, no dynamic pages, etc). Nearly all the pages are doing well in G**gle and the other SE's.
About 20 of the pages in this site were submitted to INK PFI over a year ago. These pages are showing well in the new Y SERP's, as expected. A handful of the site's other pages are also showing well in the new Y SERP's.
BUT, nearly 70% of the pages are *not* showing up on searches where I'd expect to see them. For example, our blue widgets page is doing great for a search on 'blue widgets' but our red widgets page is nowhere on a search on 'red widgets' and our green widgets page is nowhere on a search for 'green widgets'.
I have not found this surprising because this reflects my view of the old INK index - *very* spotty.
I assumed that this meant that even though Y Slurp has crawled *all* of the pages in the site over the last couple months, nowhere near all of those pages actually made it into the database, for whatever reason.
Wrong.
WHAT WE FOUND
We realized last night that every single page in the site, except one, is in the new Y! db. We know this because they can be found by typing in the URL strings, which we had never bothered to check before.
I don't think that these pages are being penalized in any way, since they are similar if not identical to the pages that are doing well, they just cover different items/topics.
Rather, it's as if they are sitting in a waiting room, to be formally included with the next update.
Very odd, since Tim had suggested in his post about the Y inclusion program that the Y index was being updated regularly, rather like the G index:
The primary means of generating our index is via our free crawl, using our new Yahoo! Slurp crawler. Yahoo! Slurp discovers pages by following links on the web. We update our index with a daily crawl to gather newly created and fast-changing URLs, as well as our main crawl which updates our index incrementally twice per week.
What seems to be happening is that the more recently acquired pages are now in the index, but are not subject to appearing on standard searches yet. Either that, or 70% of our pages are being individually penalized, but I'm almost certain that is not the case.
Clearly I'm no algo/index expert. Maybe someone else can shed light on the mechanics of this. Those of you who think that your pages have not been indexed yet may want to see if they are there on URL searches.
<added> Checked a little more. Definitely an old G cached page. Not a great sign, unless it's place holder of some sort, which I doubt. Maybe they're just defaulting to the old G db when they've got nothing to show. Suggests that Y doesn't have it indexed anyway...oh well. </added>
[edited by: caveman at 8:14 pm (utc) on Mar. 5, 2004]
However, if I do a search for my url, the result that comes back is my Yahoo Directory listing. It appears this site is listed twice. The only one with decent rankings in the SERPS is the INK PFI page.
I'm still trying to figure out what all of this means.