Hmm I think Sergey Brin should take a look at there own serps, before saying that, google indexes old cache from mid 2004, domain20%.com and a lot of 404 pages are listed.
Try site:f in google or any other letter and you will find a lot of 404 listed (a little tip for yahoos responce)
Inflated is correct, Just look at Yahoo's
own travel pages and see how many dupes are
Handbags at the ready, gentlemen.
I'm not sure a count of the "results" provided in response to a query is a valid indicator of index size or scope, since there are other factors involved in decided whether or not a particular document is considered a "match" for a specific query.
The NY Times article includes a quote from someone who has seen indications that, at least with respect to French language queries, Y!'s index has grown substantially, as well as some skepticism that the size claims can be independently verified:
...Other search engine specialists remained skeptical about the ability to estimate Web or index size as long as the search engines were being secretive about their methods. "I don't have any good way of checking,"
Yeah, going by what the number of results the engines say they find is not the best measure. G has long stated this is an approximation. In addition, G's initial # is almost always heavily inflated as most times I run a search the initial "results returned" number changes significantly (downward) as I click through into deeper results.
I don't run enough searches on a regular basis to know if this is true at Yahoo as well though.
Some woman needs to tell these guys the size isn't as important as what you do with it.
>> Some woman needs to tell these guys the size isn't as important as what you do with it.
that's what they say not to hurt our feelings ;), but in search engines it's actually true, especially considering how my 1200 page on google shows 25,000+ pages.
How come? Well, the print, save, send, and each outbound link (redirect script) is a "page". I'm afraid to look at Yahoo, I might have a 50,000 page website.
The numbers game is exactly that, a game. The most important questions are not about the quantity, but are about the quality of data stored, and the ability to deliver quality serps.
With all due respect to the intelligence of the NYTimes and its editorial policy, the fact that the kids are big - even corporate titans - still doesn't transform this "news event", IMHO, into a thing worthy of greater punditry or analysis, anywhere - including here.
Forgive my foo-like commentary but, for me, the analysis of this non-event is that it is foo, as in foo-lishness. I think engine's comment is spot on.
I can see this as round 1..
1. Your index is overinflated. (G)
2. yea but pagerank is corrupt. (Y!)
3. Your logo's terrible (G)
4. Googlebot takes performance enhancing substances (Y)
5. Your directory is a Dmoz clone (G)
6. Actualy thats you (Y!)
7. umm oh yea.... but youre fat (G)
8. Yea and you dropped out (Y!)
Ok maybee not. We can but wish.
While there is a foo quality to Sergey getting involved in saying something like this, I think that in the interest of getting to the truth of the matter, it's important to not stoop to the same kind of foo-ness.
It's easy to meow and hiss like a cat and miss the important things going on.
1: Does anyone else think it's a PR move for Sergey to step into this fight?
2: Has anyone else noticed thousands of extra pages indexed lately without any noticeable blip in ranking?
3: If these pages exist, where are they? Forums? Phantoms? non-English pages?
It's all about good press. If Yahoo is abuzz with good press and can entrench a reputation for being the biggest, it will lead to more Yahoo searches almost certainly. I guess Google decided it needed to kill it before it grows.
I suspect both Y and G are exaggerating as both obviously contain loads of junk and inumerable sites that you'd never be able to find in a search if your life depended on it.
Whilst it isn't surprising that Sergey said this (all PR spin, of course), but index size does matter to a non-technical audience who equate sheer volume with being more likely to find what they are searching for. When the news stories started a few days back, my wife (who know nothing about search) mentioned that she'd "read that Yahoo is better than Google now".
It's not for nothing that Google, even with their home page being so devoid of anything other than the bare essentials, still find it important to have the tag "Searching 8,168,684,336 web pages".
This has very little to do with facts, and a lot about simple perception.
I've just done a site search on one of my single page sites. Yahoo comes up with 2 pages!
One of them is the CSS file. Way to go. I'm going to try adding a robots.txt and maybe a javacript file to see if I can't help Yahoo past the 20 billion barrier.
EDIT> To be fair, I've checked some other sites and Yahoo seem more accurate on page count than the big G.
I think the fact that sergey made a comment means google just blinked...
From Slashdot [slashdot.org] - A Comparision of the Size of the Yahoo! and Google Indices by the National Center for Supercomputing Applications (NCSA):
|(...) we found that on average Yahoo! only returns 37.4% of the results that Google does and, in many cases, returns significantly less. As our search results indicate, there are a number of cases in which Google returns dozens of results while Yahoo! only returns one or two results, or none at all. |
The one caveat is that the research is based on searches with <1000 results due to the fact that neither SE shows more than 1000 ranked pages.
is Google out of its mind? First they blow up on a CNET article and now they claim that their's is bigger then Yahoo...its grade-school playground antics!
Us google shareholders are not happy about these emotional outbreaks. Google should get back to business and fix all the problems the users see with the engine.
The CNET article flap was grade-school, but this strikes to the core of their business. And as far as I can tell from my own work, Google is right. And we have an independent study now to suggest that Google was right on target.
>> is Google out of its mind? First they blow up on a CNET article and now they claim that their's is bigger then Yahoo
not to long ago they had the greatest PR team. Did they all get fired, or just can't control their bosses anymore? Now the most popular techie site keeps mentioning on every article how google doesn't talk to them because of a bad article (example [news.com.com])--and privacy worries get more play--and now Sergei gets caught in childish comments.
Google could've just used one of it's VPs or engineers to say the same thing, no need to get the top guy involved. You don't see the POTUS denying, or commenting on every story. That's why you have aides, employees and anon comments.
Has anyone ever read a comment where GoogleGuy got in this little wars, despite many here saying Yahoo this and MSN that? Nope!
[edited by: martinibuster at 12:15 am (utc) on Aug. 16, 2005]
[edit reason] Fixed url [/edit]
tis truly pointless landlubber speak for the captians to fire shots across the bow, fret over the depth of the ocean, the size of yer ship or the height of their respective masts. True Sea Dawgs would spend more time and be harder at work to be shed of the constant attack of them barnacles clinging at the hull.
Why would Brin say something that is so obviously aimed at provoking them? What's his deal?
Further, do you think Eric Schmidt consulted anyone before opening his big mouth about CNET? That entire situation... makes everyone look bad. Guess he doesn't care, provided they don't take his airplane away.
Has Google jumped the shark, so to speak? Pfft.
|Has anyone ever read a comment where GoogleGuy got in this little wars, despite many here saying Yahoo this and MSN that? Nope! |
You have to be kidding me.
They each only show the first thousand results, so what's the big deal. As for Google, what's the point of indexing sites, if they don't rank them(sandbox). Bragging rights I guess.
|I suspect both Y and G are exaggerating as both obviously contain loads of junk |
I heard a joke once at a conference that about 10 webmasters account for 3/4 of the index.
Anyone see the Pam Anderson Roast?
I think Tommy Lee is a liar too.
Put 'em back in your pants guys.
|I think Tommy Lee is a liar too. |
(I don't know what roast your talking about but after watching a certain video, tommy lee would shame both google and yahoo combined.)
As for the index size, Yahoo has ~1,100 of my non-duplicate pages indexed, MSN ~800, and Google 17. So as far as I can tell Yahoo has a 64x larger index than Google in my little corner of the world. No css either.
Using the site: searches it definately seems as though yahoo's index is the larger when I compare one of my own sites...
Yahoo: 319,000 pages
Google: 7,480 pages
MSN: 336 pages
Saying that I have seen 3x more spider activity from google this month compared to the inktomi spider. If it continues at the same pace then google would have spidered almost 20x the average monthly amount. Maybe they're gearing up for their own increased index?
|Some woman needs to tell these guys the size isn't as important as what you do with it |
That's true, it seems that everyone at the "plex" is suffereing from "index envy". When did the battle switch to quantity and not quality. Who's interested in how many bull**** nonsense pages are indexed anyway.
|especially considering how my 1200 page on google shows 25,000+ pages. |
It is amazing how inflated. I had a 300,000 page website that was upto 2.6 million pages in google. Its down to roughly 1 million now. Yahoo shows a more accurate 250,000.
| This 43 message thread spans 2 pages: 43 (  2 ) > > |