homepage Welcome to WebmasterWorld Guest from 50.17.86.12
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Google / Google SEO News and Discussion
Forum Library, Charter, Moderators: Robert Charlton & aakk9999 & brotherhood of lan & goodroi

Google SEO News and Discussion Forum

    
Think I've found how small the indexes are for current searches
internetheaven

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 3669139 posted 10:11 am on Jun 7, 2008 (gmt 0)

I've read a few theories in threads that Google is using small boxes of "quality" pages to produce searches now instead of slicing through all data they have available for each and every search. Many have pointed out that the data used might change based on geo-location, time of day and whether the user is logged in to their G-Accounts. Thought I'd try and get Google to show me just how many web pages were actually available for searching at any given time. I'll update the thread each time the index changes drastically:

For Sat 7th June

Google.com 25,630,000,000

Google.co.uk (The Web) 25,270,000,000
Google.co.uk (UK Pages) 68,900,000

Google.ca (The Web) 25,360,000,000
Google.ca (CA Pages) 45,200,000

It is completely IP specific too. I had to log in through a proxy server for each TLD to get different results. If I logged in to .com from a UK IP address I'd get the exact same total figures as .co.uk even though the results shown for any given search were completely different. It is my guess that these figures do NOT include supplemental pages and that every page outside these boxes are considered supplementals by the search programming. These are, in my opinion, the figures relating to the bulk of pages returned for main searches.

What uses does this information have? Well, for me it shows maturity on Google's part. I remember the days when the big guns would display the number of indexed pages on the front page and it was big news when each broke a new barrier. This maturity can only increase the quality of their search results and these small boxes of data may really help because, as many have pointed out in other threads, Google's spam removal methods tend to push spam to the top for some period before they disappear into oblivion. Smaller data sets to work from will make that easier. It also goes a long way to explaining the huge surge in threads about "two months and Google still hasn't indexed me?" and "how long before Google updates the information they have on my pages?".

I don't want to post the source of the data as I'm worried Google will block access before I have a chance to collect enough info. I will post it eventually though.

 

tedster

WebmasterWorld Senior Member tedster us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 3669139 posted 11:09 am on Jun 7, 2008 (gmt 0)

I think you may be onto something here.

Last August Google filed a patent application for selectively searching partitions of a database [webmasterworld.com]. This was right around the time that the Supplemental Results tags were removed and there was talk from the Google staff about the Supplemental Index evolving into some kind of different critter.

One key fact I notice in that patent title is the word "partitions" - that's plural as in more than one, not just the Supplemental Index.

g1smd

WebmasterWorld Senior Member g1smd us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 3669139 posted 6:06 pm on Jun 7, 2008 (gmt 0)

When you search at, say, google.com make a note of the IP address that those results come from (the ShowIP extension for Mozilla is the ONLY reliable way to do that) because some of the different IPs have different datasets and/or algorithm, and anyway, you might notice some patterns as to which IP you get depending on time of day and/or day of week.

nomis5

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 3669139 posted 8:47 pm on Jun 7, 2008 (gmt 0)

internetheaven,

I can't grasp why you think there are small pools of quality pages that Google is selecting from. I see the stats you quote but I miss the reasons for the conclusions. Sorry to be a bit dumb but can you be more specific? I ask because if you are corrrect then it's important for us all. Thanks.

Nomis5

Receptional Andy



 
Msg#: 3669139 posted 9:04 pm on Jun 7, 2008 (gmt 0)

I see very similar numbers for my "show me everything" searches - currently running at 25,350,000,000 (85,600,000 UK only).

Whitey

WebmasterWorld Senior Member whitey us a WebmasterWorld Top Contributor of All Time 5+ Year Member



 
Msg#: 3669139 posted 5:11 am on Jun 8, 2008 (gmt 0)

I see the stats you quote but I miss the reasons for the conclusions. Sorry to be a bit dumb

Make that 2 dummies . I don't understand this either. Can you help clarify per above?

Receptional Andy



 
Msg#: 3669139 posted 10:02 pm on Jun 8, 2008 (gmt 0)

I can't grasp why you think there are small pools of quality pages that Google is selecting from.

It's worth getting a handle on the supplemental index [google.co.uk] to understand the idea internetheaven mentions.

My take is that Google desires to be both relevant and comprehensive. It could be that there's an element of mutual exclusivity there. Indeed, most searches are just looking for a quick result that's relevant - not to be able to review all of the available information.

So, it makes sense for Google to restrict the data it searches through in order to satisfy the majority of searches, without wading through billions of URLs that it doesn't consider to be especially high quality.

internetheaven

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 3669139 posted 5:51 pm on Jun 9, 2008 (gmt 0)

So, it makes sense for Google to restrict the data it searches through in order to satisfy the majority of searches, without wading through billions of URLs that it doesn't consider to be especially high quality.

Aye, that's my line of thinking exactly. Sorry if that didn't seem clear to begin with?

It seems as though each datacenter is between 20-25 billion pages. Having such common numbers would suggest that this is the optimal point between enough data to mine and relevancy.

As requested, some datacenter checks:

Datacenter 66.249.93.104 - 25,270,000,000
Datacenter 64.233.179.104 - 20,090,000,000
Datacenter 216.239.51.104 - 20,090,000,000
Datacenter 66.102.9.99 - 25,360,000,000
Datacenter 66.102.9.147 - 25,350,000,000
Datacenter 66.102.9.104 - 25,360,000,000
Datacenter 64.233.161.83 - 20,090,000,000
Datacenter 64.233.183.103 - 23,790,000,000
Datacenter 64.233.189.104 - 25,360,000,000

Searcher's IP address does not seem to affect direct datacenter searches, only if you access a Google.tld

g1smd

WebmasterWorld Senior Member g1smd us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 3669139 posted 6:31 pm on Jun 9, 2008 (gmt 0)

Be aware that everything on the same Class C Block should be identical.

There's also a new Google datacentre list [webmasterworld.com] that may help.

internetheaven

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 3669139 posted 7:05 pm on Jun 24, 2008 (gmt 0)

Okay, for three nights now the index has changed values at 8pm GMT.

Today's was a drop from 25,350,000,000 at 19:59 to 19,300,000,000 at 20:00

Did some search terms too and the drop was huge. For a competitive search term the results dropped from:

710,000 results at 19:59 to just 364,000 at 20:00

I think there was a thread about how Google was returning different results for morning/night users. Looks like night users in the UK get a lot less results to choose from.

I only checked Google.co.uk for the size and for the competitive term time changes. I'll work on Google.com from a US IP address next.

youfoundjake

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 3669139 posted 4:50 am on Jun 25, 2008 (gmt 0)

How odd that at 8pm, the results drop. any speculation on the drop? servers go offline for maintanence? end of shift for google employees returning our results in .01 seconds? resources are being spent calculating the days crawl and crunching it into the index?

5ubliminal

5+ Year Member



 
Msg#: 3669139 posted 11:27 am on Jun 25, 2008 (gmt 0)

I see now: 25,340,000,000.

But how can u be sure these numbers are real and they are not estimates like result count for every search and they could differ wildly every time you hit the search button or go deeper in serps?

Reno

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 3669139 posted 2:55 pm on Jun 25, 2008 (gmt 0)

710,000 results at 19:59 to just 364,000 at 20:00

Very interesting quirk. An anomaly? or does the top of the 8pm hour trigger a more refined search? (for your time zone)

I wonder what it would be at about 8am, 3pm, 8pm?

If "more refined" at that point in time, is it the same for the USA as for the UK?

What about USA EST vs USA PST?

So many variables!

ps. internetheaven -- the thoroughness of your methodical research is impressive ... thanks for bringing it here.

.........................

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google SEO News and Discussion
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved