Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Good, supplemental and a new results category?

I need interpretation for this weird data....

         

Neticman

6:07 pm on Nov 17, 2006 (gmt 0)

10+ Year Member



If I search using the site: operator, I get all indexed pages in a site.

Say:

site:webmasterworld.com
Results 1 - 100 of about 165,000

If I combine 2 operators I get hard-to-explain results:

site:webmasterworld.com the
Results 1 - 100 of about 197,000

The combined search should bring less results, not more.

If I run the same search on other sites, the results show the opposite:

site:disney.com
Results 1 - 100 of about 941

site:disney.com the
Results 1 - 100 of about 389

Now, the combined search gives much less pages. Do you think that most of the Disney.com pages avoid the word 'the'? Not likely.

I believe that combined searches pull data from a more reliable source, from the keyword viewpoint.

If anyone has an explanation for this data, please let me know...

tedster

8:05 pm on Nov 17, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Here's my understanding. Because of the way data is sharded on the back end, the "number of urls" information has always been an estimate - and sometimes a wildly inaccurate estimate. Even with some attention earlier this year on improving that accuracy, it can still be out in left field -- and it's not something that's exactly mission critical for the average Google user.

We can't read much into those numbers at all. In short, it's pretty broken.

Neticman

12:36 pm on Nov 18, 2006 (gmt 0)

10+ Year Member



I found a pattern with the combined operators:

site:(domain.com) (keyword)

The total results found under this query provide a more reliable figure of the keyword-associated (valuable) pages in a site.

Most sites have only a 20% of the total pages associated with a keyword. This figure is an indicator of how well the site was spidered and indexed.

In other words, many pages in a site are not supplemental, but are useless anyway.

The data supporting this assumption are being published as part of my googleometry project, and are available for discussion.